Beautiful piece of work. This bit jumped out at me:

"First problem: I don’t have a 16-bit code segment! Second: I don’t have a way to generate 16-bit code with GCC."

I had that exact same problem writing a loader for a now defunct OS and for years I kept a copy of Borland C++ handy so that I could compile a 16 bit trimmed down version of the filesystem that would load the rest of the OS before jumping into 32 bit mode. The tricky bit was that the loader had to interact with the BIOS in 16 bit mode and I could not find a way to cleanly jump back to 16 bit mode from 32 bit mode once I got there, so multiple transitions of that 16 to 32 bit boundary were out.

So instead I cut out everything from the filesystem code that had to do with writing and updating things just to read a couple of files, place them in memory, flip to 32 bit and then jump to the equivalent of 'init'.

The x86 compilers of old had a whole pile of 'models' that you could write for, all with different sizes of code and data segment limits.

DJ Delorie's excellent GCC to DOS port (DJGPP) was another very important tool in that whole process.

Also beautiful how web.archive.org provided one of the key bits of information, I suspect that in the longer term it will be as important as WikiPedia.

There is also crossplatform and still actively developed OpenWatcom [1], which can generate code for DOS and runs from all mainstream operating systems.

[1] https://github.com/open-watcom/open-watcom-v2