My great invention was that you dont have to collect all the garbage, only most of it. So I was assuming all the numbers on the CPU-stack were pointers to live cells. Never run out of memory because of this heresy.

    > (ncompile '(cons 1 (cons 2 3)))
    $09CB:$7163:   MOV  AX,$04 ; S-object 1
    $09CB:$7166:   PUSH AX
    $09CB:$7167:   MOV  AX,$05 ; S-object 2
    $09CB:$716A:   PUSH AX
    $09CB:$716B:   MOV  AX,$06 ; S-object 3
    $09CB:$716E:   MOV  BX,AX
    $09CB:$7170:   POP  AX
    $09CB:$7171:   CALL $05AE   ; cons
    $09CB:$7174:   MOV  BX,AX
    $09CB:$7176:   POP  AX
    $09CB:$7177:   CALL $05AE   ; cons
    $09CB:$717A:   JMP  $1DA7
      (subru: eval=$7163, compile=$3B6F)

One thing you can do is `push $05` which only needs 2 bytes. The idea of using the x86 stack instructions to construct cons cells is potentially brilliant. I experimented with it a little bit. Couldn't make it work. For example, here's how it changes certain aspects of the implementation:

    Cons:   xchg    %sp,%cx            # Cons(m:di,a:ax):ax
            push    %di
            push    %ax
            xchg    %sp,%cx
            mov     %cx,%di
            xchg    %di,%ax
            ret

    Apply:  ...
            xchg    %cx,%sp
    Pairlis:test    %di,%di           # Pairlis(X:di,Y:si,a:dx):ax
            jz      1f                # for x,y in zip(X,Y)
            push    (%bx,%di)         #      (- . y)
            push    (%bx,%si)         #      (x . y)
            push    %sp               #     ((x . y))
            push    %dx               #     ((x . y) . a)
            mov     %sp,%dx           # a = ((x . y) . a)
            mov     (%si),%si
            mov     (%di),%di
            jmp     Pairlis
    1:      xchg    %cx,%sp
            ...
My "compiler" was totally context-free and made in assembler. Which means that it compiles every instruction in complete vacuum and assumes stuff comes in AX and BX registers (with rest-pointer in CX, I think). And result in AX.

But this proved not to be a bad start at all. Once you understand the limitations of the "compiler", you can modify the macros accordingly. One of the feature of the compiler was that it assigned absolute memory places for variables, so you could stop wasting stack and do early assignments to temporary variables.

Unfortunately the source is quite incomprehensible now because of insane use of nested macros: https://github.com/timonoko/nokolisp

But the example given above works, no doubt about it:

    > (setq test (ncompile '(cons 1 (cons 2 3))))
    > (test)
     
    (1 2 . 3)