It’ll be very interesting to see these efforts to make QEMU faster–until now it seems like flexibility and compatibility were the main goals, with performance being much lower priority, but with a good optimizing JIT it might become reasonably competitive.

What if QEMU ends up faster than native? I could imagine a future where programs are built unoptimized or with `-Os` for download size, and optimized by the operating system before or during execution, with x86 or arm (or risc-v!) ending up as the default "portable executable instruction format" for a whole bunch of CPU architectures. Where the first thing the OS or even the BIOS loads is qemu...

Not gonna happen, so it's a bit irrelevant. QEMU's architecture is fairly generic and it was never built for speed in the first place. Improving the speed of emulation is more about getting better than the current baseline of "10x or worse slower than native, and continuing to gradually get slower if we add features and don't actively try to work on performance". If we got to "3x slower than native" I would be really surprised.

Which isn't to say that you can't do better than 3x-slower under any circumstances, just that if you wanted performance you'd probably be better off starting from scratch with an emulator design that cared about performance and which was really clear about its use-cases -- eg "this is user-space only, not system emulation" and "this is only this very small set of host and guest architectures". QEMU does a lot of different things in one codebase, which makes it cumbersome to change anything and hard to put in optimisations or simplifications which might be valid for a specific host/target combination but not more widely.

> if you wanted performance you'd probably be better off starting from scratch with an emulator design that cared about performance and which was really clear about its use-cases

See box86:

https://github.com/ptitSeb/box86