The post glosses over the most important part of Erlang's GC: it collects process heaps separately. This transforms a hard problem (collecting a global heap with low latency despite concurrent mutators) to a _much_ simpler problem, at the price of more copying. Compare Java's G1 with Erlang's GC; the former hurts my head.

For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.

Wouldn't Erlang be much more efficient if it simply compiled to the JVM?

Almost 10 years ago, i've tested erjang [1] using a medium sized application. Throughput was better than BEAM but latency was terrible.

[1] https://github.com/trifork/erjang/