My great invention was that you dont have to collect all the garbage, only most of it. So I was assuming all the numbers on the CPU-stack were pointers to live cells. Never run out of memory because of this heresy.
> (ncompile '(cons 1 (cons 2 3)))
$09CB:$7163: MOV AX,$04 ; S-object 1
$09CB:$7166: PUSH AX
$09CB:$7167: MOV AX,$05 ; S-object 2
$09CB:$716A: PUSH AX
$09CB:$716B: MOV AX,$06 ; S-object 3
$09CB:$716E: MOV BX,AX
$09CB:$7170: POP AX
$09CB:$7171: CALL $05AE ; cons
$09CB:$7174: MOV BX,AX
$09CB:$7176: POP AX
$09CB:$7177: CALL $05AE ; cons
$09CB:$717A: JMP $1DA7
(subru: eval=$7163, compile=$3B6F)
One thing you can do is `push $05` which only needs 2 bytes. The idea of using the x86 stack instructions to construct cons cells is potentially brilliant. I experimented with it a little bit. Couldn't make it work. For example, here's how it changes certain aspects of the implementation:
Cons: xchg %sp,%cx # Cons(m:di,a:ax):ax
push %di
push %ax
xchg %sp,%cx
mov %cx,%di
xchg %di,%ax
ret
Apply: ...
xchg %cx,%sp
Pairlis:test %di,%di # Pairlis(X:di,Y:si,a:dx):ax
jz 1f # for x,y in zip(X,Y)
push (%bx,%di) # (- . y)
push (%bx,%si) # (x . y)
push %sp # ((x . y))
push %dx # ((x . y) . a)
mov %sp,%dx # a = ((x . y) . a)
mov (%si),%si
mov (%di),%di
jmp Pairlis
1: xchg %cx,%sp
...
But this proved not to be a bad start at all. Once you understand the limitations of the "compiler", you can modify the macros accordingly. One of the feature of the compiler was that it assigned absolute memory places for variables, so you could stop wasting stack and do early assignments to temporary variables.
Unfortunately the source is quite incomprehensible now because of insane use of nested macros: https://github.com/timonoko/nokolisp
But the example given above works, no doubt about it:
> (setq test (ncompile '(cons 1 (cons 2 3))))
> (test)
(1 2 . 3)