So... grumpy old man response here:

These are all tiny, targetted microoptimizations worth a percent or three of benefit in specific tests. They're worth doing (or at least evaluating) in any mature product and I have no complaint.

Nonetheless rustc remains really slow relative to other similar technologies, including C++ compilers. Is there any consensus as to why?

I mean, with C++, the answer is something to the effect of "template expansion happens syntactically and generally has to be expressed in headers, leading to many megabytes of code that has to be compiled repeatedly with every translation unit". And that isn't really amenable to microoptimization. We all agree that it sucks, and probably can't be fixed with the language as it's specified, and chalk it up to a design flaw.

What's the equivalent quip with rustc? I mean... is it going to get faster (in a real sense, not micro), or is it not? Is this fixable or not, and if not why?

Clang has a profiler for why your C++ compiles slow. Last I saw templates were not responsible for much in a typical codebase.

For Rust, though, the answer is obvious - the borrow checker. The language is designed around the idea of the compiler proving whether or not the code is safe and inserting appropriate destructors where they should be. I think it's expected that this will be slower than a language that just doesn't do that.

Any pointers to the profiler?