What does HackerNews think of salsa?

Ask HN: Learning Modern Compilers? | Jun 2023

A (now 6 year old) discussion that might be helpful: Anders Hejlsberg on Modern Compiler Construction: https://www.youtube.com/watch?v=wSdV1M7n4gQ

Also https://github.com/salsa-rs/salsa

Turbopack, the successor to Webpack | Oct 2022

Expand Context ↕

The content of that "Core Concepts" page sounds a lot like https://github.com/salsa-rs/salsa

The end of Optane is bad news | Aug 2022

Expand Context ↕

[continued from above due to size limit]

What I'm referring to is incremental recompilation with the following properties:

1. automatic correctness

` - that is, a compiler change not explicitly updating anything related to dependency tracking should at most result in conservative coarser-grained behavior, not unsoundly cause untracked dependencies

` - manual cache invalidation logic is a clear indicator a compiler isn't this - e.g. this rules out https://gcc.gnu.org/wiki/IncrementalCompiler (maybe also Roslyn/C# and the Swift compiler, but I'm not sure - they might be hybrid w/ automated coarse-grained vs manual fine-grained?)

` - the practical approach to this is to split workload into work units (aka tasks/queries/etc.) and then force information flow through centralized "request"/"query" APIs that automatically track dependencies - see https://github.com/salsa-rs/salsa for more information

` - research along the lines of ILC (Incremental Lambda Calculus) might yield more interesting results long-term

` - outside of a compiler, the only examples I'm aware of are build systems like tup (https://gittup.org/tup/) or RIKER (https://github.com/curtsinger-lab/riker) which use filesystem sandboxing (FUSE for tup, ptrace/seccomp-BPF for RIKER) for "automatically correct" build dependencies

2. fine-grained enough

` - at the very least, changes within function bodies should be isolated from other bodies, with only IPOs ("interprocedural optimizations", usually inlining) potentially introducing additional dependencies between function bodies

` - one extreme here is a "delta-minimizing" mode that attempts to (or at least warns the developer when it can't) reduce drift during optimizations and machine code generation, such that some fixes can end up being e.g. "patch a dozen bytes in 200 distro packages" and get quickly distributed to users

` - but even with history-agnostic incremental recompilation (i.e. output only depends on the current input, and past compilations only affect performance, not other behavior), function-level incremental linking (with one object file per function symbol) could still be employed at the very end, to generate a binary patch that redirects every function that grew in size, to additional segments added at the end of the file, without having to move any other function

3. propagating only effective "(query) output" changes

` - also called "firewalling" because it blocks irrelevant details early (instead of redoing all work transitively)

` - example 1: editing one line changes the source byte offsets of everything lower down in the file, but debuginfo doesn't need to be updated since it tracks lines not source byte offsets

` - example 2: adding/removing explicit types in a function should always cause type checking/inference to re-run, but nothing using those type inference results should re-run if the types haven't changed

4. extending all the way "down" (to machine code generation/object files/linking)

` - existing optimizing compilers may have a hard time with this because they didn't design their IRs/passes to be trackable in the first place (e.g. a LLVM call instruction literally contains a pointer to the function definition, with no context/module/pass-manager required to "peek" at the callee body)

` - can be theoretically be worked around with one object file per function, which https://gcc.gnu.org/wiki/IncrementalCompiler does mention (under "Code Generation" / "Long term the plan is ..."), but that alone isn't sufficient if you want optimizations (I've been meaning to write a rustc proposal about this, revolving around LLVM's ThinLTO having a more explicit split between "summary" and "full definition")

5. extending all the way "up" (to lexing/parsing/macros)

` - this is probably the least necessary in terms of reducing output delta, but it can still impede practical application if it becomes the dominant cost - AFAICT it's one of the factors that doomed the GCC incremental experiment ("I’m pretty much convinced now that incremental preprocessing is a necessity" - http://tromey.com/blog/?p=420)

` - outside of "rebuild the world", this also makes a compiler much more suitable for IDE usage (as e.g. a LSP server)

My experience is mostly with the Rust compiler, rustc, whose incremental support:

- started off as coarse skipping of generating/optimizing LLVM IR

- has since evolved to cover 1./2./3.

- while many passes in the "middle-end" were already both "local" and "on-demand", sometimes allowing incrementalization by changing only a dozen lines or so, 4. is indefinitely blocked by LLVM (at best we can try to work around it), and 5. (incremental "front-end") has been chipped at for years with several factors conspiring to make it orders of magnitude more difficult:

` - macro expansion and name resolution being intertwined (with the former mutating the AST in-place while the latter being a globally stateful algorithm)

` - "incremental front-end" used to be seen as essential for IDE usage and that notion started falling apart in 2018 or so (see rust-analyzer aka "RA" below, though RA itself is not itself directly responsible, nor do I hold it against them - it's a long messy story)

` - this work (without the IDE focus, AFAICT) has mostly driven by random volunteer work, last I checked (i.e. without much "official" organization, or funding, though I'm not sure what the latter would even be)

- its design has been reimagined into the "salsa" framework (https://github.com/salsa-rs/salsa - also linked above)

` - most notable user I'm aware of is the rust-analyzer (aka "RA") LSP, which hits 1./2./3./5. (4. isn't relevant, as RA stops at IDE-oriented analysis, no machine code output) - without getting too into the weeds, RA is a reimplementation of "Rust {front,middle}-end", and ideally eventually rustc should be able to also be just as good at 5. (see above why it's taking so long)

I am not aware of any other examples of industrial compilers having both 1. and 2. (or even academic ones but I suspect some exist for simple enough languages) - every time someone says "incremental compilation? X had that Y decades ago!", it tends to either be manually approximating what's safe to reuse (i.e. no 1.), or have file-level granularity (i.e. no 2. - the "more obviously correct" ones are close to the "separate compilation" of C/C++/many others, where a build system handles parallel and/or incremental rebuilds by invoking the compiler once per file), or both.

While 2./5. are enough to be impressive for IDE usage all by themselves, the interactive nature also serves to hide issues of correctness (lack of 1. may only show up as rare glitches that go away with more editing) or inefficiencies (lack of 3. leading to all semantic analyses being redone, may still be fast if only requested for the current function).

Another thing I haven't seen is distro build servers talking about keeping incremental caches around (and compiling incrementally in the first place), for builds of Rust "application" packages (most of them CLI tools like ripgrep, I would assume) - it's probably too early for the smaller stuff, but a cost/benefit analysis (i.e. storage space vs time saved rebuilding) would still be interesting.

Turning the IDE Inside Out with Datalog | Jul 2020

This is how Rust Analyzers intellisense works, it uses a query engine called Salsa that stores the symbols in a database and the linting and completion/semantics are entirely query-driven:

https://github.com/salsa-rs/salsa

Good video describing it's use for working with Rust's AST:

https://youtu.be/i_IhACacPRY?t=1348

Query-Based Compiler Architectures | Jun 2020

Expand Context ↕

The blog post does cite salsa, which is the frame work that was created to create Lark, the language that was created to protoype Rust's implementation of query-based compilation.

https://github.com/lark-exploration/lark

https://github.com/salsa-rs/salsa

Query-Based Compiler Architectures | Jun 2020

Expand Context ↕

I think the author linked a Rust example of a query-based compiler: https://github.com/salsa-rs/salsa

The Rust compiler is still getting faster | Jul 2019

Expand Context ↕

Salsa is an example of logic they are writing for the Rust compiler that is being kept in a generic crate for anyone to use

See https://github.com/salsa-rs/salsa

Why CRDT didn't work out as well for collaborative editing xi-editor | May 2019

Expand Context ↕

The Skip language (which I mentioned in another comment) is a language based entirely on incremental computation. Every piece of code you write in it becomes incremental, as far as I know.

AND the compiler is self-hosted!!! So that means they have a fully incremental compiler.

Although there is an LLVM back end, so maybe only the front end of the compiler is incremental? That's still important. It comes from the same people as the Hack language, which I believe has a fast "online" typechecker -- which I think is incremental (?)

I believe you could write a C compiler in Skip the same way?

If you peek at the source code, it has a 6000 line recursive descent parser, and 6000 line type checker, much in the style of OCaml (but in Skip).

I think this deserves a blog post because it's not clear to me that the message of Skip got through (7 months ago on HN). Their website has some examples, but not too many.

Of course, I haven't tried it yet, and it's a super ambitious project, so there could be some big problem with it.

But if the goal is to make an entire suite of compiler and devtools that are incremental, then Skip seems like it's ahead of the state of the art and something to borrow from.

Salsa was also mentioned, and I believe it's based on some of the same academic work like Adapton. But I don't know if they have an entire self-hosted compiler as a demo like Skip does. It seems to be "query-oriented" rather than supporting arbitrary code, which is what I would expect for most incremental computing frameworks. Skip seems to be more ambitious.

https://github.com/salsa-rs/salsa

I believe that the Skip language does this, because it's compiler is self-hosted. It's http://skiplang.com/blog/2017/01/04/how-memoization-works.ht...

https://github.com/skiplang/skip/tree/master/src/frontend

Skip – A programming language to skip the things you have already computed https://news.ycombinator.com/item?id=18077612

https://github.com/skiplang/skip/blob/master/src/frontend/sk...