Alex says, "Digital work is inherently ephemeral." This is precisely backwards; digital work is one of the least ephemeral aspects of human material culture, exceeded only by occasional miraculous analog exceptions like the Pyramids, potsherds, the Lascaux paintings, and Ötzi's axe. The Torah is digital—encoded in a sequence of discrete symbols rather than continuously varying quantities—and that's why it's survived for 3000 years. The digitization of Socrates's words by Plato and Xenophon is the reason we argue about him today, 2500 years later, rather than his forgotten Persian contemporaries or even Heraclitus.

Being digital is what makes the idea of an "exact copy" make sense. You can make an exact copy of some version of the Torah or the Symposium because it's only the discrete letters that matter; the analog nuances of tone of voice or thickness of pen stroke do not count.

So digitality is the alternative to the ephemerality of the analog, which is inevitably eaten up by moths and rust. We all know this about digitized language, but for some reason now that we've digitized reasoning in the form of computer programs, we habitually throw up our hands and declare defeat in the face of inevitable ephemerality.

This is bullshit.

What I really want, instead of screenshots, is a deterministic, reproducible computing environment. The idea is something like uxn or Nock: a platform that's simple enough to stay compatible forever, and efficient enough to be used for many things, even if there are a few things that I do on a computer that need more performance.

There are a lot of inspirational examples that offer tempting evidence that this is possible for large, interesting classes of computations: the Smalltalk-78 revival emulator Vanessa Freudenberg wrote, the UM of the Cult of the Bound Variable (which had over 300 successful independent reimplementations), Nguyen and Kay's sketch of Chifir, Lorie's archival UVC, Wirth's RISC, uxn/Varvara, the JVM, and the numerous emulators of things like the MS-DOS environment, the NES, and the Gameboy that are good enough to run the original games.

I'm not saying it would be an improvement to do all your digital creative work on an emulated Gameboy in order to ensure that it was reproducible. I think we can do a lot better than that. None of the presently existing archival virtual machines are adequate. But I think the reproducibility of Gameboy games tells us that we don't have to accept bitrot as the price of using computers.

Alex says, "They’re not as good as having the original, working thing – but they’re much better than nothing". Well, let's figure out how we can have the original, working thing! This is software, it's a simple matter of programming.

sitkack

Wasm can be a key piece of the system you seek. A simple VM, the heap is serializable and the linkage to the outside world has to be fully defined.

kragen

Wasm definitely has some useful ideas for efficient reproducible computing, but it is ridiculous to describe it as "a simple VM" in comparison to Wirth's RISC, the Cult of the Bound Variable's UM, Chifir, Smalltalk-78, or even the NES or uxn/Varvara. I think this page lists over 1000 instructions: https://webassembly.github.io/spec/core/syntax/instructions....

sitkack

It is a lot, but not quite 1000, yet. From https://webassembly.github.io/spec/core/appendix/index-instr... I see 436 instructions including SIMD which is over half the population. If I filtered it correctly, it looks like there are about 203 instructions in Wasm without SIMD. Many of those are not necessary for most programs.

There at least three small Wasm interpreters in Rust

4kloc https://github.com/yblein/rust-wasm

3kloc https://github.com/k-nasa/wai

500loc https://github.com/rustwasm/wasm-bindgen/tree/HEAD/crates/wa...

This list has 169 Wasm instructions https://github.com/rolfrm/wasm-lisp/blob/master/instruction....

Wirth's RISC is neat, I'd love to re-do it in RISC-V (only 47 instructions in the base ISA). UM, Chifir and UXN look like Art (not pejorative), I'll definitely read the Chifir paper. They would be great systems to run on top of Wasm.

https://git.sr.ht/~bctnry/chifir

One might be able squeeze a Chifir VM into an ESP-32 (with external PSRAM).

kragen

I agree that UM, Chifir, and uxn are Art, and that wasm is at least potentially a great platform to run this kind of archival virtual machine on top of, as well as having some very interesting ideas about how to design a VM instruction set to be amenable to efficient implementation. RISC-V is a good source of ideas for that, too!

And I appreciate you setting me straight about the extent of wasm's instruction proliferation.

But I find much to disagree with.

— ⁂ —

> There at least three small Wasm interpreters in Rust

None of those are small; the smallest one you found is 500 lines of code, and it's very incomplete, implementing only 11 of the 436 or however many wasm instructions there are, and even those it only implements partially. (Its only arithmetic is addition and subtraction, for example.) Even the 3kloc one says it doesn't pass the wasm testsuite; its pub enum Instruction has 174 items and most of those are implemented as follows:

                Instruction::F64Store(_, _) => todo!(),
                Instruction::I32Store8(_, _) => todo!(),
                Instruction::I32Store16(_, _) => todo!(),
                Instruction::I64Store8(_, _) => todo!(),

The 4kloc one (which I think is actually closer to 2.8kloc) doesn't include any of the SIMD ops, but it claims to implement the whole wasm spec as of 02019, and it's plausible that it actually does. A casual glance at the code doesn't reveal anything that contradicts that claim; it implements about 200 instructions.

None of them include the peripherals, which are by definition excluded from wasm. But peripherals are usually the part of an emulator that requires the most effort, and they're usually a much bigger compatibility bitrot shitshow than your CPU is.

By contrast, the UM interpreter in the Cult of the Bound Variable paper was I think 55 lines of C (also, not including peripherals). My dumb Chifir interpreter was 75 lines of code; adding Yeso graphical output was another 30 lines https://gitlab.com/kragen/bubbleos/blob/master/yeso/chifir-y....

Uxn, despite its inefficiency, runs useful applications on the Nintendo DS today (in 5200 lines of pretty repetitive C, including the peripherals), and wasm doesn't and probably never will. And 365 teams in the ICFP programming contest independently implemented the UM successfully enough to run at least some existing applications; there will probably never be 300+ independent reimplementations of wasm. So in important ways they're already closer to the goal of eliminating bitrot than wasm ever will be. This is probably because eliminating bitrot isn't part of wasm's goals.

(I realize I forgot to mention the Infocom Z-machine, as well.)

Wasm has another deficiency other than complexity: as far as I know there's no standard way for a wasm program to generate some wasm and start running it, although of course the browser platform does provide that ability. This is essential if the thing you want to run under it is a virtual machine or other sort of emulator, because the only way to do an efficient emulator is to compile the code you want to interpret into the instruction set the emulator is running on. Something wasm-like can dramatically simplify this; if you're compiling to wasm, for example, you don't have to do instruction scheduling or register allocation, and you might not even have to do constant folding and function inlining.

— ⁂ —

One of the interesting ideas in the RISC-V ecosystem is that a Cray-style vector instruction set (RV64V) can give you SIMD-instruction-like performance without SIMD-instruction-like instruction set inflation. And, as the APL family shows, such vector instructions can include scalar math as a special case. I haven't been able to come up with a way to define such a vector instruction set that wouldn't be unacceptably bug-prone, though; https://dercuano.github.io/notes/vector-vm.html describes some of the things I tried that didn't work.

— ⁂ —

Why am I being so unreasonable about the amount of code? After all, a few hundred lines of C is something that you can write in an afternoon, right, so what's the big deal about 500 or 3000 lines of code for something you'll use for decades? And nobody has ever written an NES emulator in 500 or 3000 lines of code.

The problem is that, to parody Perlis's epigram, if your virtual machine definition has 500 lines of code, you probably forgot some. If a platform includes that much functionality, you have designed it so that that functionality has to live in the base platform rather than being implemented in programs that run on the platform. And that means that you will be strongly tempted to add stuff to the base platform, which is how you break programs that used to work.

In the case of MS-DOS or NES emulation this is limited by the fact that Nintendo couldn't go out and patch all the Famicoms and NESes in people's houses, so if they wanted to change things, well, too bad. NES emulator authors have very little incentive to add new functionality because the existing ROMs won't use it, and that's what they want to run.

sitkack

I think SIMD was a distraction to our conversation, most code doesn't use it and in the future the length agnostic, flexible vectors; https://github.com/WebAssembly/flexible-vectors/blob/master/... are a better solution. They are a lot like RVV; https://github.com/riscv/riscv-v-spec, research around vector processing is why RISC-V exists in the first place!

I was trying to find the smallest Rust Wasm interpreters I could find, I should have read the source first, I only really use wasmtime, but this one looks very interesting, zero deps, zero unsafe.

16.5kloc of Rust https://github.com/rhysd/wain

The most complete wasm env for small devices is wasm3

20kloc of C https://github.com/wasm3/wasm3

I get what you are saying as to be so small that there isn't a place of bugs to hide.

> “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” CAR Hoare

Even a 100 line program can't be guaranteed to be free of bugs. These programs need embedded tests to ensure that the layer below them is functioning as intended. They cannot and should not run open loop. Speaking of 300+ reimplementations, I am sure that RISC-V has already exceeded that. The smallest readable implementation is like 200 lines of code; https://github.com/BrunoLevy/learn-fpga/blob/master/FemtoRV/...

I don't think Wasm suffers from the base extension issue you bring up. It will get larger, but 1.0 has the right algebraic properties to be useful forever. Wasm does require an environment, for archival purposes that environment should be written in Wasm, with api for instantiating more envs passed into the first env. There are two solutions to the Wasm generating and calling Wasm problem. First would be a trampoline, where one returns Wasm from the first Wasm program which is then re-instantiated by the outer env. The other would be to pass in the api to create new Wasm envs over existing memory buffers.

See, https://copy.sh/v86/

MS-DOS, NES or C64 are useful for archival purposes because they are dead, frozen in time along with a large corpus of software. But there is a ton of complexity in implementing those systems with enough fidelity to run software.

Lua, Typed Assembly; https://en.wikipedia.org/wiki/Typed_assembly_language and Sector Lisp; https://github.com/jart/sectorlisp seem to have the right minimalism and compactness for archival purposes. Maybe it is sectorlisp+rv32+wasm.

If there are directions you would like Wasm to go, I really recommend attending the Wasm CG meetings.

https://github.com/WebAssembly/meetings

When it comes to an archival system, I'd like it to be able to run anything from an era, not just specially crafted binaries. I think Wasm meets that goal.

https://gist.github.com/dabeaz/7d8838b54dba5006c58a40fc28da9...