I've been working on a project that auto generates c programs - sometimes up to 1.5m lines of code - in a single file (actually two files but the second is only 35 lines)

Not open source but happy to share benchmarks if that would be useful.

Too bad it's not open source, but will some of the generated programs be?

Also, would you mind comparing it to Csmith (https://embed.cs.utah.edu/csmith/)?

There is quite a lot of IP in the generated programs so probably not possible to share sadly.

I wasn't aware of Csmith so thanks for highlighting. My C code doesn't really test many features of the compiler so I suspect mainly of interest in seeing just how the compiler handles a really large single file.

There's also https://github.com/intel/yarpgen which I haven't used. I believe there are a couple of others...