> One particular issue we fixed a couple of times in TigerBeetle is replacing by-value with by-pointer loops:

I don't know about other tools and places, but one nice thing about working at Facebook is the internal Infer linter tool[1] is generally good about producing warnings for "this copy could be a ref instead"[2] (in the majority C++ codebase) at code review time, without manually combing the LLVM IR for memcpys. (Internally, Infer is using several handwritten analyses on C++ AST.)

Reading further, it seems like they are essentially looking for the pattern where a memcpy call is generated with a large constant size parameter at compile time. Things of this nature should be somewhat easy to write a static analyzer pass for, if you've got an existing AST/SSA level framework. I believe there is already an Infer pass for this for C++, but it might be a different internal analyzer.

[1]: https://fbinfer.com/

[2]: https://github.com/facebook/infer/blob/main/infer/documentat... (and related warnings, e.g., https://github.com/facebook/infer/blob/main/infer/documentat... )

Yup!

Ideally, you want to do this analysis on compiler IR, _before_ it gets lowered to LLVM IR. But to do that in a sustainable way, you need a quasi-stable internal IR format. Zig is rather new, and, while the compiler is a delight to hack on, there are no stable extension interfaces, and the code itself is very much not settled yet. So that's the main thing we get out of LLVM IR here is relative stability. You can quickly hack something together, and be reasonably sure that you won't have to spend a lot of time upgrading the infra with every compiler upgrade. LLVM IR of course is not absolutely stable, but it is stable enough, and way more stable than compiler internals at the moment.

At Trail of Bits, we've been working on this type of IR for C and C++ code [1]. We operate as a kind of Clang middle end, taking in a Clang AST, and spitting LLVM IR that is Clang-compatible out the other end. In this middle area, we progressively lower from a high-level MLIR dialect down to LLVM.

[1] https://github.com/trailofbits/vast