But what was the JVM tuning? Thats the most interesting part!

Currently it's "-Xms30g -Xmx30g -XX:+UseG1GC -XX:+PrintCodeCache -XX:ProfiledCodeHeapSize=500m -XX:NonProfiledCodeHeapSize=500m -XX:NonNMethodCodeHeapSize=24m -XX:ReservedCodeCacheSize=1024m -XX:InitialCodeCacheSize=1024m -XX:ParallelGCThreads=24"

Everything after G1GC was suggested by various helpful experts on HN, Discord, by email and other media.

Are you running the whole lichess on one machine, or is this one shard only? 30 GB RAM for one instance of application seems very high. (sorry did not have time to read the whole article yet)

That's the one and only server running https://github.com/lichess-org/lila, ie. the main scala JVM application.

See https://lichess.org/source for a list of all the services with a more-or-less up to date diagram.