What does HackerNews think of vast?
VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or further program abstraction.
The reasonI think this is because: Most languages target C or LLVM, and C and LLVM have a fundamentally lossy compilation processes.
To get around this, you'd need a hodge podge of pre compiler directives, or take a completely different approach.
I found a cool project that uses a "Tower of IRs" that can restablish source to binary provenance, which, seems to me, to be on the right track:
https://github.com/trailofbits/vast
I'd definitely like to see the compilation processes be more transparent and easy to work with.
Or consider large-scale code analysis and refactoring tooling - perhaps having that should be a bootstrap target, a unit of absorption is app repo, and a key threshold is ability to refactor source? Or not repo, but arbitrary scale. So Language Server Protocol blended with dynamic loading and calling convention? Smalltalkish "live" image environment with piles of mutating forked repos? That sort of cross-checks - ask a "programmer apprentice" ai, err, or simply a programming team, "I'd like capabilities foo with characteristics bar", a set of forked repos might be an unremarkable outcome. So that might suggest a language bootstrap target of ffi plus refactoring-LSP client?
FFI is an unremarkable bootstrap target, and a refactoring-LSP client gives control over both sides, so maybe next, how to move code across the line? AST scraping and transliteration? Polyglot direct memory access to data types? Suggesting as targets maybe high-end "can exercise and analyze compiler output" ffi, and rich AST tooling? Language implementation, with its specs and test suites, can be a nice context for such work. Which might bring us back around to an emphasis on early implementation of other languages, but with maybe an increased focus on interoperation with existing implementations? Control of config, build, and linkage, might also need emphasis?
Hmm, fun, tnx! [1] https://github.com/trailofbits/vast [2] https://news.ycombinator.com/item?id=33387149
Our goals with this pipeline are to enable static analyses that can choose the right abstraction level(s) for their goals, and using provenance, cross abstraction levels to relate results back to source code.
Neither Clang ASTs nor LLVM IR alone meet our needs for static analysis. Clang ASTs are too verbose and lack explicit representations for implicit behaviours in C++. LLVM IR isn't really "one IR," it's a two IRs (LLVM proper, and metadata), where LLVM proper is an unspecified family of dialects (-O0, -O1, -O2, -O3, then all the arch-specific stuff). LLVM IR also isn't easy to relate to source, even in the presence of maximal debug information. The Clang codegen process does ABI-specific lowering takes high-level types/values and transforms them to be more amenable to storing in target-cpu locations (e.g. registers). This actively works against relating information across levels; something that we want to solve with intermediate MLIR dialects.
Beyond our static analysis goals, I think an MLIR-based setup will be a key enabler of library-aware compiler optimizations. Right now, library-aware optimizations are challenging because Clang ASTs are hard to mutate, and by the time things are in LLVM IR, the abstraction boundaries provided by libraries are broken down by optimizations (e.g. inlining, specialization, folding), forcing optimization passes to reckon with the mechanics of how libraries are implemented.
We're very excited about MLIR, and we're pushing full steam ahead with VAST. MLIR is a technology that we can use to fix a lot of issues in Clang/LLVM that hinder really good static analysis.