If they're seeing these kinds of gains from relatively minor changes to their Python code, I can't help but wonder how much faster the model would run in a compiled language or a language with a good JIT (way more optimization work's gone into the mainstream Javascript runtimes than CPython).

I'd assumed that overall performance in Stable Diffusion was limited by the code running on the GPU, with Python performance being a fairly minor factor-- but I guess that's not the case?

I've always assumed Python was interpreted until I heard Nuitka [1].

It would be interesting to get a benchmark using CPython vs Nuitka related to this change.

[1] https://github.com/Nuitka/Nuitka