Stable Diffusion's advantage is in the huge amount of open source activity around it. Most recently that resulted in ControlNet, which is far more powerful than anything Midjourney can currently do - if you know how to use it.
Controlnet is a neural network added to an already trained model so they can be conditioned on new stuff like canny edge, depth map, segmentation map. Controlnet let you train this model on the new condition "easily", without catastrophic forgetting and without a huge dataset. In the repo linked by OP, they have trained a controlnet model on the segmentation map generated by SAM: https://segment-anything.com/
At huggingface: https://huggingface.co/spaces/hysts/ControlNet
Play around with the different models, you might get better results with some vs others.
The big news in the Stable Diffusion land has been the release of ControlNet: https://github.com/lllyasviel/ControlNet. It hasn't gotten much traction on HN: https://news.ycombinator.com/item?id=34761780. It allows you to maintain the shape of existing images (or sketch new ones) and use Stable Diffusion to fill them out, my example: https://twitter.com/LechMazur/status/1626668677473918981.