What does HackerNews think of stoneknifeforth?
a tiny self-hosted Forth implementation
More seriously, a metacircular example to draw from would be: https://github.com/kragen/stoneknifeforth
so far the author of this one has only written the first chapter
i've written a few didactic compilers that are complete enough to compile themselves, though not complete enough to compile anything else; these may be of more interest to someone who's looking for how to make writing a compiler surprisingly easy, though they do lack the extensive explanatory text in the linked blog post
- http://canonical.org/~kragen/sw/urscheme (from a subset of scheme to at&t-syntax i386 assembly, 1553 lines of code)
- https://github.com/kragen/stoneknifeforth (from a forth-like language to an i386 linux elf executable, 132 lines of code)
- https://github.com/kragen/peg-bootstrap/blob/master/peg.md (from a peg grammar with semantic actions to javascript, 66 lines of code)
- https://github.com/kragen/peg-bootstrap/blob/master/handaxew... (tangles a literate program in virtually any plain-text format, such as markdown, html, restructuredtext, or tex, into virtually any plain-text programming language, 208 lines of code; used to tangle both itself and the peg compiler compiler above)
- http://canonical.org/~kragen/sw/dev3/meta5ix.m5 (from meta5ix, a language for writing parsers derived from val schorre's meta-ii, to meta5ix assembly, 18 lines of code; it is a top-down parsing language with limited backtracking, like pegs, but less expressive and implementable without heap allocation or the ability to backtrack past the beginning of a line; see http://canonical.org/~kragen/sw/dev3/meta5ixrun.py for an m5asm interpreter)
- http://canonical.org/~kragen/sw/dev3/meta5ix2c2.m5 (from a slightly different variation of meta5ix to c, 133 lines of code)
most of these do take the same ast-free approach as the linked blog post and crenshaw, just emitting code as they finish parsing a production (or, in the case of stoneknifeforth, a token), but ur-scheme in particular does not; it uses scheme s-expressions as its syntax tree, because some kind of syntax tree is necessary to be able to allocate some local variables on the stack while also supporting closures. peg-bootstrap can certainly handle code that builds an ast with its semantic actions, but its implementation of itself just glues together snippets of javascript text
also worth mentioning in this connection:
- darius bacon's work, especially parson (for example, https://github.com/darius/parson/blob/master/eg_calc_compile...)
- the oberon compiler in wirth and gutknecht's oberon book
- the later chapters of sicp
well, that's unfair, because https://github.com/kragen/stoneknifeforth definitely doesn't qualify as a forth, it doesn't have immediate words
s f g x → f x (g x)
k a b → a
i = s k k (one of many possible definitions)
maybe wolfram knows of a simpler alternativemy favorite is abadi and cardelli's object-calculus on which i based bicicleta. it has two reduction rules rather than the λ-calculus's one. using a hybrid of bicicleta's syntax and abadi and cardelli's
{f = ς(v)b, g = ς(v)c, ...}{f = ς(v)d} → {f = ς(v)d, g = ς(v)c, ...}
{f = ς(v)b, g = ς(v)c, ...}.f → b[{f = ς(v)b, g = ς(v)c, ...}/v]
the first of these derives an inherited object with a new definition for method f. the second one invokes method f on an object, which is evaluated by replacing its self-variable v with the object itself throughout its body, using the standard β-reduction semantics with α-renaming that we're familiar with from the λ-calculus (b[x/y] means b but with x replacing y)despite the simplicity of the semantics the ς-calculus is enormously more usable for actually writing down functions than the λ-calculus. here's factorial(10) in the notation above (untested)
{fac = ς(env){
n = ς(_)3,
return = ς(o)
(o.n < 2).if_true{
then = ς(_)1
else = ς(_)o.n * env.fac{n = ς(_)o.n - 1}.return
}.return
}
}.fac{n = ς(10)}.return
bicicleta has a lot of syntactic sugar which reduces this to (tested) {env: fac = {fac:
arg1 = 3
'()' = (fac.arg1 < 2).if_true(
then = 1
else = fac.arg1 * env.fac(fac.arg1 - 1)
)
}}.fac(10)
in particular to be able to define infix operators inside the language, bicicleta rewrites x * y
as x.'*'{arg1 = y}.'()'
and analogously for -, <, etc.they published a bunch of papers and a book on this but they were more interested in static typing than anything else. a paper on one imperative variation of the thing is http://lucacardelli.name/Papers/PrimObjImpSIPL.A4.pdf
you can implement a turing machine in a few lines of c, making it internally simpler than the other alternatives, but as you point out it's unusable except as a compilation target or to prove theorems about
: mean_std ( sum2 sum inv_len -- mean std )
\ precise_mean = sum * inv_len;
tuck u* \ sum2 inv_len precise_mean
\ mean = precise_mean >> FRAC;
dup FRAC rshift -rot3 \ mean sum2 inv_len precise_mean
\ var = (((unsigned long long)sum2 * inv_len) >> FRAC) - (precise_mean * precise_mean >> (FRAC*2));
dup um* nip FRAC 2 * 32 - rshift -rot \ mean precise_mean^2 sum2 inv_len
um* 32 FRAC - lshift swap FRAC rshift or \ mean precise_mean^2 sum*inv_len
swap - isqrt \ mean std
;
I've done all these things (except designing the hardware) and I agree that it can be very painful. I did some of them in 02008, for example: https://github.com/kragen/stoneknifeforthThe thing is, though, you can also not do all those things. You can use variables, and they don't even have to be allocated on a stack (unless you're writing a recursive function, which you usually aren't), and all the NIP TUCK ROT goes away, and with it all the Memory Championship tricks. You can test each definition interactively as you write it, and then the fact that the language is absurdly error-prone hardly matters. You can use metaprogramming so that your code is as DRY as a nun's pochola. You can use the interactivity of Forth to quickly validate your hypotheses about not just your code but also the hardware in a way you can't do with C. You can do it with GDB, but Forth is a lot faster than GDBscript, but that's not saying much because even Bash is a lot faster than GDBscript.
But Yossi was just using Forth as a programming language, like a C without local variables or type checking, not an embedded operating system. And, as I said, that's really not Forth's strength. Bash and Tcl aren't good programming languages, either. If you try to use Tcl as a substitute for C you will also be very sad. But the way they're used, that isn't that important.
I explained a more limited version of this 12 years ago: https://yosefk.com/blog/my-history-with-forth-stack-machines...
So, I don't think Forth is only useful when you have the freedom to change the problem, though programs in any language do become an awful lot easier when you have that freedom.
Or it might be the guy who wrote http://canonical.org/%7Ekragen/sw/dev3/paperalgo, http://canonical.org/%7Ekragen/sw/urscheme/, http://canonical.org/~kragen/bytebeat/, https://github.com/kragen/dumpulse, https://news.ycombinator.com/item?id=2933619, https://gitlab.com/kragen/bubbleos/blob/master/yeso, https://github.com/kragen/stoneknifeforth, and https://news.ycombinator.com/item?id=695981 in the past, and within the last three months wrote https://news.ycombinator.com/item?id=27171597, https://news.ycombinator.com/item?id=26930408, https://news.ycombinator.com/item?id=26884231, https://news.ycombinator.com/item?id=26674832, https://news.ycombinator.com/item?id=26672806, https://news.ycombinator.com/item?id=26671656, https://news.ycombinator.com/item?id=26654767, https://news.ycombinator.com/item?id=26596892, https://news.ycombinator.com/item?id=26587768, https://news.ycombinator.com/item?id=26581742, https://news.ycombinator.com/item?id=26561319, https://news.ycombinator.com/item?id=26547871, https://news.ycombinator.com/item?id=26543937, https://news.ycombinator.com/item?id=26580684, https://news.ycombinator.com/item?id=26528534, https://news.ycombinator.com/item?id=26525837, https://news.ycombinator.com/item?id=26525109, http://canonical.org/~kragen/sw/dev3/qvaders, https://news.ycombinator.com/item?id=26452393, http://canonical.org/~kragen/sw/dev3/skitch, https://news.ycombinator.com/item?id=26449902, http://canonical.org/~kragen/sw/dev3/meta5ixrun.py, https://news.ycombinator.com/item?id=26438596, http://canonical.org/~kragen/sw/dev3/mukanren.ml, https://news.ycombinator.com/item?id=26418271, http://canonical.org/~kragen/sw/dev3/kmregion.h, https://news.ycombinator.com/item?id=26299172, http://canonical.org/~kragen/sw/dev3/readprint.fs, https://news.ycombinator.com/item?id=26289195, https://news.ycombinator.com/item?id=26219950, https://news.ycombinator.com/item?id=26219000, https://news.ycombinator.com/item?id=26198567, https://news.ycombinator.com/item?id=26231940, https://news.ycombinator.com/item?id=26189525, https://news.ycombinator.com/item?id=26195060.
I'm not going to link to particular comments in https://news.ycombinator.com/threads?id=giardini (it would be too easy to think I was cherry-picking the worst ones) but you can read through it yourself and see that the guy has a long history of almost nothing but incessant worthless trolling.
If you can't tell the difference...
Maybe you mean that before ELF support, Linux a.out executables had to be loaded at a fixed virtual address? That's possible—I started using Linux daily in 01995, at which point a.out was already only supported for backward compatibility.
Still, I don't think it's fair to say without qualification, "Code generation is much harder." Very simple code generation can be done by pasting together canned code fragments (the original meaning of "compiler") and occasionally computing and encoding a jump offset. A very simple code generator like the one in StoneKnifeForth https://github.com/kragen/stoneknifeforth is simpler than the parser needs to be for many popular languages. However, at least with my limited knowledge, it appears to me that parsing is a relatively closed-ended problem — sure, you can work hard to improve your error detection and recovery, and to give more useful error messages, but you're apparently going to get very little return for even enormous efforts at that. Optimization, on the other hand, which is part of code generation, is potentially arbitrarily complex, and you can keep getting good returns on your efforts for quite a long time.
So I would say that the easiest code generation is usually easier than the easiest parsing, unless you have the liberty to choose your language to make it easier to parse. But the hardest code generation is much, much harder than the hardest parsing. (Again, unless you're parsing a language deliberately designed to be difficult!)
My favorite things about Ur-Scheme are that the code is clean, as you say, and he wrote up the lessons learned at some length. No files: I think he meant not yet implementing primitives like open-input-file. Macros you can get by without: I used to use an R4RS Scheme system of my own without them, except sometimes I'd call on a dumb defmacro expander as a preprocessor.
Kragen also wrote https://github.com/kragen/stoneknifeforth which I haven't read as much of.
FWIW I also wrote this about a sort of self-hosting Python in Python: https://codewords.recurse.com/issues/seven/dragon-taming-wit... -- as a bytecode compiler it's not too big but it depends on the giant Python runtime.
Relatedly, I wrote a tiny quasi-Forth in itself that generates ELF executables directly at https://github.com/kragen/stoneknifeforth. Its executables also stopped running around Linux 2.6.27, and I still haven't bothered to figure out why.
In April some of us were playing with bootstrapping a development environment in MS-DOS from COPY CON: http://lists.canonical.org/pipermail/kragen-discuss/2011-Apr... http://lists.canonical.org/pipermail/kragen-hacks/2011-April... Dave Long put together a version where the machine code was entirely printable ASCII.
Brian Raiter wrote a lovely article a few years back on the smallest possible ELF executable, which turns out to be 45 bytes, which unfortunately has been broken by recent Linuxes: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...
Based on Brian's work, I wrote a self-compiling compiler for a tiny dialect of Forth: https://github.com/kragen/stoneknifeforth which is about two pages of code if you remove the comments. I also wrote a PEG compiler-compiler in JavaScript which is one page of code: https://github.com/kragen/peg-bootstrap/blob/master/peg.md --- ultimately I want to make a version that generates StoneKnifeForth code instead of JS, but I haven't gotten around to it.
I also designed a minimal CPU architecture: https://github.com/kragen/calculusvaporis but I only have software implementations for it so far, and no interesting code.
And I wrote a metacircular interpreter for Bicicleta in half a page of code, which I guess is some kind of "minimal coding example": http://lists.canonical.org/pipermail/kragen-hacks/2007-Febru...
https://github.com/kragen/screenfuls is a collection of tiny, readable programs that do interesting things.