I wonder how difficult it would be to make something similar that generated 3D models. Most of the examples look like they'd make good video game levels.
0. This neural thing, of course, to create landscape-like 2D projections of a plausible scene.
1. Wave-function collapse models that synthesize domain data quite nicely when parametrized with artistic care - this is a "simpler" example of the concept. https://github.com/mxgmn/WaveFunctionCollapse
2. Fairly good understanding how to synthesize terrain. Terragen is a good example of this (although not public research, the images drive the point home nicely) https://planetside.co.uk/
So, we could use the source image from this as a 2D projection of an intended landscape as a seed to a wave-function collapse model that would use known terrain parametrization schemes to synthesize something usable (so basically create a Terragen equivalent model).
I think that's it plausibly more or less. But it's a "research" level problem still, I think, not something one can cook up by chaining the data flow from a few open source libraries together.