Does anyone know why nim seems to spend so much time in the kernel? The most obvious example being that last benchmark for spectral-norm where nim spends almost 3 seconds in the kernel, and rust spends 0.

For the spectral-norm one I think it's a failed attempt at parallelism. This code spawns 4000 tasks:

    parallel:
      for i in 0..
(https://github.com/hanabi1224/Programming-Language-Benchmark...)

It's almost four times as slow as the otherwise identical single-threaded version.