What does HackerNews think of stage0?

A set of minimal dependency bootstrap binaries

Language: Assembly

Somewhat tangential, but I'm curious how big the bootstrap seed for Nix is. That is, if you wanted to build the entire world, what's a minimum set of binaries you'd need?

Guix has put quite a bit of work into this, AFAIU, and it's getting close to being bootstrappable all the way from stage0 [0]. Curious if some group is also working on similar things for Nix.

[0]:https://github.com/oriansj/stage0

Even if they aren't, people absolutely should be able to bootstrap new platforms from scratch. It's important to have confidence in our tools, in our ability to rebuild from scratch, and to be safe against the "trusting trust" attack among other things.

Lately I've been catching up on the state of the art in bootstrapping. Check out the live-bootstrap project. stage0 starts with a seed "compiler" of a couple hundred bytes that basically turns hex codes into bytes while stripping comments. A series of such text files per architecture work their way up to a full macro assembler, which is then used to write a mostly architecture-independent minimal C compiler, which then builds a larger compiler written in this subset of C. This then bootstraps a Scheme in which a full C compiler (mescc) is written, which then builds TinyCC, which then builds GCC 4, which works its way up to modern GCC for C++... It's a fascinating read:

https://github.com/oriansj/stage0

https://github.com/fosslinux/live-bootstrap/blob/master/part...

Even if no one is "using" this it should still be a primary motivator for keeping C simple.

The Goals section from the projects main page are pretty interesting: https://github.com/oriansj/stage0

  This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal 
  of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement 
  of a single 1 KByte binary or less.

  Additionally, all code must be able to be understood by 70% of the population of programmers. If the code can 
  not be understood by that volume, it needs to be altered until it satisfies the above requirement.
Very cool project, but I guess I won't hold out for compiling a working web browser from this project.
You are getting close to the complexity of FLOSS, but there is slightly more to it, some further thoughts below.

> freedoms afforded by FLOSS licenses necessitate the availability of the source code

This isn't really true, security researchers, reverse engineers and piracy experts often do the equivalent of the FSF four freedoms without having the source code. Of course not having the source code makes it harder for people without those skills and without the often costly proprietary tools that enable this work.

It is possible as a technically skilled person to write a binary in machine code, without any assembly or "source code" or other primary format. When an FLOSS license is applied to that binary, it should be considered Free Software. An example of this is the hex0 binary of the stage0 project, which is the first piece of code run by the Bootstrappable Builds project, which aims to build an entire Linux distro starting with only the ~512bytes of machine code in hex0 plus all the necessary source code.

https://savannah.nongnu.org/projects/stage0/ https://github.com/oriansj/stage0 https://ekaitz.elenq.tech/hex0.html https://bootstrappable.org/ https://github.com/fosslinux/live-bootstrap/blob/master/part...

> Which makes me wonder if programs written to be deliberately obfuscated are technically not allowed to be considered free-libre or open source software?

Debian definitely rejects such software, I assume the FSF/OSI would too, although they mostly concern themselves with licenses rather than actual software projects. In the past at least 3 times in different FSF/GNU projects, the FSF/GNU project has caused downstream GPL violations due to releases that were missing source code. Even minified JS without the original JS is not considered DFSG-free by Debian, even though it is extremely common these days. Debian applies this rule to all digital files, no matter whether they are programs or fonts or images or videos or other things. Some articles related to this topic:

http://www.inventati.org/frx/essays/softfrdm/whatissource.ht... https://b.mtjm.eu/source-code-data-fonts-free-distros.html https://wiki.freedesktop.org/www/Games/Upstream/#source http://compliance.guide/pristine

> programs written to compete in IOCCC

The main thing about source code is that downstream users be afforded equality of access to a work as the original author of a work. So if you can write an obfuscated program and realistically modify it yourself without hiding the real source code from users, then that is considered fine from the source code point of view. Of course it is a terrible way to write a program and should get modified to de-obfuscate everything, so more people can understand the code.

If you really want to blow your mind you can go all the way back to stage0[0].

A more practical depth would be to bootstrap with GNU Mes, which is a source based bootstrapping path, that begins with a tiny scheme interpreter (~5000 LOC of simple C) and a C compiler written in scheme that are mutually self-hosting.

These tools can compile a slightly patched version of TinyCC that is self-hosting. Using this C compiler you can bootstrap a bunch of gnutools (glibc, binutils, gcc) and using these tools you can bootstrap a full Guix Linux distro.

[0]: https://github.com/oriansj/stage0

It's great to see that more people are still working on this and that people have an interest.

If you are interested in this kind of thing, then you'll also want to check out LibreBoot[1] and Bootstrappable Builds[2]. The latter is working with stage0 [3] and mes [4] to bootstrap Guix (among other projects.) All of that is further down the chain, but we'll need it if we want to build trustworthy systems.

1. https://libreboot.org/

2. https://www.bootstrappable.org

3. https://github.com/oriansj/stage0/

4. https://www.gnu.org/software/mes/

Relevant github repository: https://github.com/oriansj/stage0

I'm very fascinated about these kinds of things, although I understand zero of it. I believe this folder has some magic:

https://github.com/oriansj/stage0/tree/master/stage0

>Are you referring to https://github.com/oriansj/stage0?

Yep, that's the one.

>More generally each language would need it’s own path from this basic assembler to a compiler implementing that language in C which doesn’t necessarily exist. While initial versions of the Rust compiler were written in C, more recent versions are self hosted and rely on the previous version. This goes for projects like LLVM too since it requires a c++14 compiler to start with.

Sure. This part has largely been done for many languages, including Rust: https://guix.gnu.org/blog/2018/bootstrapping-rust/

...and there's a bootstrap path for the bottom of that graph all the way from Guix's (fairly minimal but not quite stage0 yet) bootstrap binaries. LLVM too; since you mentioned it and I was curious, here's the dependency graph of LLVM 9.0.1 on my system, in GraphViz and PDF formats:

https://terracrypt.net/upload/llvm.pdf https://terracrypt.net/upload/llvm.dot

(That's from `guix graph -t bag-with-origins [email protected]` on my laptop.)

It's my understanding that being bootstrappable in this way is a requirement for being included in upstream Guix, so generally everything that's available in Guix can be bootstrapped like this.

Now, what's not done yet is the path from stage0 to building the bootstrap binaries, but that's the eventual goal. The rest of it? It's not futile, it's done. Not all software is in Guix, but enough of it is that I'm typing this from a laptop running almost exclusively software from Guix.