Third year PhD student in compilers here. Yes, compilers are hard but they are fun and addicting. No regrets switching from cryptography in masters to compilers in PhD. Lattice based cryptography is actually very hard and that's why I couldn't continue.
Always wanted to ask an academic about this.
Compilers take input source and libraries and maybe some configuration parameters (in C you can pass preprocessor definitions, etc) and produce an output.
No network calls, no conditional file IO - the only side effects that are needed have to be done at the beginning, when you load source files and other libraries.
Why are so many compilers not utilizing more pure functional paradigms? Focusing on transformations of input to output instead of doing side effects. It seems like such a perfect fit.
In academia many compilers are written in functional languages, particularly OCaml. Unfortunately, the programming community has come to this bone-headed idea that to be a serious language, its compiler must be implemented in itself. As a result, compiler implementation languages follow popular language trends. I’m sure a much faster TypeScript compiler could be written in OCaml or Rust, for instance.
> Unfortunately, the programming community has come to this bone-headed idea that to be a serious language, its compiler must be implemented in itself.
I wouldn't say so, I think 'serious' compiler developers seem to take a fairly level-headed view here.
Python and JavaScript for example aren't likely to ever be self-hosting, as it wouldn't make much sense. There's PyPy of course, but people only pay attention to PyPy when it does impressive things with performance, it doesn't get all that much 'credit' for being written in RPython. If that were all it brought to the table it would be viewed as a mere curiosity.
Java is moving to be more self-hosted (Graal), but this can bring real advantages (fewer undetected buffer overflows inside the JVM for instance), it's not being done just because it's cute.
Compilers can of course be written in all manner of different languages, I don't think it's a mistake that DMD is written in D, or that GNAT (the Ada frontend for GCC) is written in Ada, or that GHC is written in Haskell. It could serve as a useful example of a complex program written in that language, and it only requires that contributors know the language they're compiling.
> I’m sure a much faster TypeScript compiler could be written in OCaml or Rust, for instance.
In that specific instance I suspect you might be right, but this is just guesswork.