This was discussed here [1]. The conversation died off around a year ago. Not sure why... Although, it is pretty clear that BOLT's optimizations are not "free". For example:

> BOLT uses enormous amount of memory (about 6GB).

> Perf needs to be ran with LBR support; this is almost always unsupported with VMs (which means you don't want to run the measurement inside CI).

[1] https://github.com/rust-lang/rust/issues/50655

Interesting, would be nice for rustc to have a special flag that enable "maximum performance" at any compile time/ram price

At any price? You could throw something like souper at the problem[0]. It makes compilation about 20 times slower in some benchmarks in their paper. This is on the IR level, iirc similar optimizers exist for assembly generation, so you can throw that in as another pass.

[0] https://github.com/google/souper