What does HackerNews think of gg?

The Stanford Builder

Language: C++

But the cloud can and will be a fundamental part of the new developer tools.

Since reading the blog post's mention of Repl.it I went and downloaded their new Iphone app and used https://modal.com to spin up 30-40 containers from a script doing sentiment analysis on ~30k movie reviews: https://twitter.com/jonobelotti_IO/status/158291976221638656...

This cost me about 5 cents.

Developer environments and workflows built around the idea that you won't compile and run code on your own device can do wild things at the press of an Iphone app button.

U.C Berkeley has called part of this vision 'serverless for all computation', https://kappa.cs.berkeley.edu/.

edit: Another user also pointed to Stanford's 'gg': https://github.com/StanfordSNR/gg.

Running it locally will always be faster as long as your machine is not a bottleneck (#cores, ram, ...). I think the use-case for distcc et al is to enable less-powerful machines to run builds faster by levering other machines. That’s exactly what we use it for at work. Our developers have not-so-powerful laptops and with distcc/icecc they can utilize the power of our build agents in the server room.

Also interesting to read: https://github.com/StanfordSNR/gg

I just ported the continuous build for https://www.oilshell.org/ to sr.ht for this reason:

http://www.oilshell.org/blog/2020/11/fixes-and-updates.html#...

A contributor added .travis.yml about 3 years ago, before I had ever used it. But I've been around the block enough to know that getting stuff for free is temporary. (And to be fair, I did like Travis CI free service a lot better than I thought I would.)

So when I needed to enhance the continuous build back in January, I did it with a PORTABLE SHELL SCRIPT, NOT with yaml. Both Travis CI and sr.ht provide Linux VMs, which are easily operated with a shell script.

The script called "Toil" does the following:

1. Configures which steps are run in which build tasks (both Travis CI and sr.ht can run multiple tasks in parallel for each commit)

2. Logs each step, times it, summarizes failure/success

3. Archives/compresses the logs

3. Publishes the result to my own domain, which is A LOT FASTER than the Travis CI dashboard. (sr.ht is very fast too; it has a great web design.)

This ended up working great, so I have multiple CI services running the same jobs, and publishing to the same place: http://travis-ci.oilshell.org/

(I plan to create srht.oilshell.org for security reasons; it isn't ideal that both services have a key to publish to the same domain.)

----

I think this is the future of the Oil project: shell scripts to enable portability across clouds. If you want to be fancy, it's a distributed or decentralized shell.

This is natural because shell already coordinates processes on a single machine.

- A distributed shell coordinates processes across multiple machines (under the same domain of trust)

- A decentralized one does so across domains of trust (across clouds)

-----

Really great work in this direction is gg:

https://buttondown.email/nelhage/archive/papers-i-love-gg/ comments: https://lobste.rs/s/virbxa/papers_i_love_gg

which is a tool that runs distributed programs across multiple FaaS providers like Amazon Lambda, Google Cloud Functions, etc.

https://github.com/StanfordSNR/gg

My "toil" script is a lot more basic, but an analogous idea. I would like to create a slightly tighter but improved abstraction that runs on multiple cloud services. Notes on gg here:

https://github.com/oilshell/oil/wiki/Distributed-Shell

If anyone wants to help, get in touch! If you are pissed off about Travis then you might want this :) I think these kinds of multi-cloud setups are inevitable given the incentives and resources of each party, and they already exist (probably in a pretty ugly/fragile form).

In addition to improved local compilation, the constraint space for compiler design is also changed by cloud compilation. Distributed deterministic compilation like [1] would permit community-level caching. Distribution reduces the importance of peak computes - if prompt compilation requires a hundred cores, that could be ok. Community caching reduces the importance of worst-case performance - if some rich type analysis of a standard library takes days, that might be tolerable if it only needs to happen once. If optimization decisions can become a discrete things-on-the-side, like current execution traces for jit, then there's a new opportunity for human-compiler collaboration - "lay out the data in memory like so, when the cpu cache has this shape". I'm looking forward to the ferment of hobby languages starting to explore this new space. What would you do differently in designing a language, if the language community was served by a single giant instance of a compiler? How might that change the economics of language development? PLAAS?

[1] https://github.com/StanfordSNR/gg Sort of checksums-and-argument-lists to make gcc deterministic, as for a cache, but farmed out so `make -j100` runs on amazon Lambda.

https://github.com/StanfordSNR/gg - should be almost the same as the GCC/LLVM thunk extractor. You have to pay for the borrow checker, but LLVM IR optimization passes should be the same complexity.
GitHub repo: https://github.com/StanfordSNR/gg

Intro article in packt: https://hub.packtpub.com/hello-gg-a-new-os-framework-to-exec...

> The functional approach and fine-grained dependency management of gg give significant performance benefits when compiling large programs from a cold start.

In case of Inkscape, when running “cold” on AWS Lambda, gg was nearly 5x faster than an existing icecc system, running on a 48-core or 384-core cluster of running VMs.

> uses a sophisticated caching system to avoid needlessly rebuilding artifacts

Reminded me of the Stanford Builder gg[1], which does highly parallel gcc compilation on aws lambda. make -j2000.

So with a zig cc drop-in, you might get highly-parallel cross-compilation?

Though the two caching systems might be a bit redundant.

[1] https://github.com/StanfordSNR/gg

https://github.com/StanfordSNR/gg can do builds in the cloud. Their USENIX talk is really interesting
One of the major time savers of bazel is Remote Build Execution (RBE), which allows you to build modules in parallel in the cloud. So if you have 1000 CPUs, you can really just have a client do `bazel build -j 1000 //...` and you can get a huge speed-up. Remote (and local) builds all happen in a sandbox, so you don't have to worry about e.g. preparing a docker image with the worker / build slave environment. (You do, however, have to register your dependencies with Bazel, which can be a hard at first). To add to this, bazel also has a remote global cache which can benefit large teams.

For fairly large C++ codebases, RBE is really a competitive advantage. I've seen RBE cut down iteration time by an order of magnitude. I love CMake, and CMake can get you plenty of parallelism, but CMake doesn't really provide a tool for building several CMake sub-projects in parallel, and bazel handles this really well.

Sadly Bazel RBE is still primarily a Google Cloud product. Also, GCE is slow to work on supporting auto-scale, so you have to pay for unused workers. (Like most products in Google Cloud, Google is ripping you off with alpha-quality stuff). There's some very rough open source RBE stuff on Github that you can run yourself, but nothing really production-grade yet.

gg ( https://github.com/StanfordSNR/gg ) is a promising-looking alternative. It's research code, but it might be the community's best hope for a non-Google alternative (that e.g. supports AWS Lambda for parallelism). Bazel is great, but without independence (e.g. what Kubernetes achieved) it's difficult to see bazel as dependable as make or CMake long term.

Maybe take a look at gg [0]? It seems to solve the problem of slow compilation quite nicely.

[0] https://github.com/StanfordSNR/gg

Code from the paper: https://github.com/StanfordSNR/gg

Source from another HN comment (https://news.ycombinator.com/item?id=20795154) -

This work was led by my student Sadjad Fouladi. If you liked Salsify, you might really like Sadjad's talk in June at USENIX ATC about "gg", his system for letting people use AWS Lambda as a rented supercomputer (e.g. he can compile inkscape and Chromium really fast by outsourcing the computation to 8,000 Lambda nodes that all talk to each other directly over UDP): https://www.youtube.com/watch?v=Cc_MVldSijA (code here: https://github.com/StanfordSNR/gg)

Hi all -- Salsify co-author here. Surprised to see us here again, but happy to be part of the conversation (here's a previous one: https://news.ycombinator.com/item?id=16964112).

This work was led by my student Sadjad Fouladi. If you liked Salsify, you might really like Sadjad's talk in June at USENIX ATC about "gg", his system for letting people use AWS Lambda as a rented supercomputer (e.g. he can compile inkscape and Chromium really fast by outsourcing the computation to 8,000 Lambda nodes that all talk to each other directly over UDP): https://www.youtube.com/watch?v=Cc_MVldSijA (code here: https://github.com/StanfordSNR/gg)

You might also be interested in our current video project, led by my student Francis Yan, on trying to improve live video streaming. If you visit and watch some live TV you can help contribute to our study: https://puffer.stanford.edu

> namespace

Nod. I fuzzily recall being told yeas ago of ITA Software struggling to even build their own CL code. Reader-defined-symbol load-order conflict hell, as I recall. And that was just a core engine, embedded in a sea of Java.

> second class citizens

I too wish something like Kernel[1] had been pursued. Kernel languages continue to be explored, so perhaps someday. Someday capped by AI/VR/whatever meaning "it might have been nice to have back then, but old-style languages just aren't how we do 'software' anymore".

> detailed documentation covering all the design criteria and coding decisions

As in manufacturing, inadequate docs can have both short and long-term catastrophic and drag impacts... but our tooling is really bad, high-burden, so we've unhappy tradeoffs to make in practice.

Though, I just saw a pull request go by, adding a nice function to a popular public api. The review requested 'please add a sentence saying what it does.' :)

So, yeah. Capturing design motivation is a thing, and software doesn't seem a leader among industries there.

> enable future generations to build upon what has been done.

Early python had a largely-unused abstraction available, of objects carrying C pointers, so C programs/libraries could be pulled together at runtime. In an alternate timeline, with only slightly different choices, instead of monolithic C libraries, there might have been rich ecology. :/ The failure to widely adopt multiple dispatch seems another one of these "and thus we doomed those who followed us to pain and toil, and society to the loss of all they might have contributed had they not been thus crippled".

> To understand a piece of Lisp code [...struggle]

This one I don't quite buy. Java's "better for industry to shackle developers to keep them hot swappable", yes, regrettably. But an inherent struggle to read? That's always seemed to me more an instance of the IDE/tooling-vs-language-mismatch argument. "You're community uses too many little files (because it's awkward in my favorite editor)." "You're language shouldn't have permitted unicode for identifiers (because I don't know how to type it, and my email program doesn't like it)." CL in vi, yuck. CL in Lisp Machine emacs... was like vscode or eclipse, for in many ways a nicer language, that ran everything down to metal. Though one can perhaps push this argument too far, as with smalltalk image-based "we don't need no source files" culture. Or it becomes a "with a sufficiently smart AI-complete refactoring IDE, even this code base becomes maintainable".

But "trickily" written code, yes. Or more generally, just crufty. Perhaps that's another of those historical shifts. More elbow room now to prioritize maintenance: performance less of a dominating concern; more development not having the flavor of small-team hackathon/death-march/spike-into-production. And despite the "more eyeballs" open-source argument perhaps being over stated, I'd guess the ratio of readers to writers has increased by an order of magnitude or two or more, at least for popular open source. There are just so very many more programmers. The idea that 'programming languages are for communicating among humans as much as with computers' came from the lisp community. But there's also "enough rope to hang yourself; enough power to shoot yourself in the foot; some people just shouldn't be allowed firearms (or pottery); safety interlocks and guards help you keep your fingers attached".

One perspective on T(est)DD I like, is it allows you to shift around ease of change - to shape the 'change requires more overhead' vs 'change requires less thinking to do safely' tradeoff over your code space. Things nailed down by tests, are harder to change (the tests need updating too), but make surrounded things easier to change, by reducing the need to maintain correctness of transformation, and simplifying debugging of the inevitable failure to do so. It's puzzled me that the TDD community hasn't talked more about test lifecycle - the dance of adding, expanding, updating, and pruning tests. Much CL code and culture predated testing culture. TDD (easy refactoring) plus insanely rich and concise languages (plus powerful tooling) seems a largely unexplored but intriguing area of language design space. Sort of haskell/idris T(ype)DD and T(est)DD, with an IDE able to make even dense APL transparent, for some language with richer type, runtime, and syntax systems.

Looking back at CL, and thinking "like , just a bit different", one can miss how much has changed since. Which hides how much change is available and incoming. 1950's programs each had their own languages, because using a "high-level" language was implausibly heavy. No one thinks of using assembly for web dev. Cloud has only started to impact language design. And mostly in a "ok, we'd really have to deal with that, but don't, because everyone has build farms". There's https://github.com/StanfordSNR/gg 'compile the linux kernel cold-cache in a thrice for a nickle'. Golang may be the last major language where single-core cold-cache offline compilation performance was a language design priority. Nix would be silly without having internet, but we do, so we can have fun. What it means to have a language and its ecosystem has looked very different in the past, and can look very different in the future. Even before mixing in ML "please apply this behavior spec to this language-or-dsl substrate, validated with this more-conservatively-handled test suite, and keep it under a buck, and be done by the time I finish sneezing". There's so much potential fun. And potential to impact society. I just hope we don't piss away decades getting there.

[1] https://web.cs.wpi.edu/~jshutt/kernel.html

Similarly, you can compile ffmpeg on Lambda, in 0.5 minutes, for 9 cents.[1] Versus 10 min on one core, for ~free. And while -j200 of ffmpeg is nice, -j1000 of the linux kernel is... wow, like seeing the future.

[1] demo in a talk: https://www.youtube.com/watch?v=O9qqSZAny3I&t=55m15s (the actual run (sans uploading) is at https://www.youtube.com/watch?v=O9qqSZAny3I&t=1h2m58s ); code: https://github.com/StanfordSNR/gg ; some slides (page 24): http://www.serverlesscomputing.org/wosc2/presentations/s2-wo...

How is progress on deterministic builds?

It would be nice to be able to use something like Keith Winstein (Stanford) et al's gg.[1] Sort of `make -j1000` for 10 cents on Lambda. Linux kernel cold compilation in under a minute.

[1] https://github.com/StanfordSNR/gg ; video of talk demo: https://www.youtube.com/watch?v=O9qqSZAny3I&t=55m15s ; some slides (page 24): http://www.serverlesscomputing.org/wosc2/presentations/s2-wo...

Keith Winstein (Stanford) et al's gg [1] is also fun. Sort of `make -j1000` for 10 cents. Create a deterministic-compilation model of a C build task, upload the source files, briefly run a lot of lambdas, download the resulting executable. (Though it's more general than that.)

For folks long despairing that our programming environments have been stuck in a rut for decades, we're about to be hit by both the opportunity to reimagine our compilation tooling, and the need to rewrite the world again (as for phones) for VR/AR. If only programming language and type systems research hadn't been underfunded for decades, we'd be golden.

[1] https://github.com/StanfordSNR/gg ; video of talk demo: https://www.youtube.com/watch?v=O9qqSZAny3I&t=55m15s ; some slides (page 24): http://www.serverlesscomputing.org/wosc2/presentations/s2-wo...