All the examples are portraits of people.

I have to wonder whether it works well with anything else.

trained on celebA, so no, but you could for sure train this on a more varied dataset

Would it be as simple as feeding it a bunch of decolorized images along with the originals?

yes, so infinite training data. but the challenge will be scaling to large resolutions and getting global consistency

I guess you can always use a two-stage process. First colorize, then upscale

yeah, you can use SOTA super res, but that tends to be generative too (even diffusion based on its own, or more commonly based on GANs). it can be a challenge to synthesize the right high res details.

but that’s basically the stable diffusion paper (diffusion in latent space plus GAN superres)

Yeah, if you have a high res image, you can get color info at super low-res and then regenerate the colors at high res with another model. (though this isn't an efficient approach at all)

https://github.com/TencentARC/T2I-Adapter

i've also seen a controlnet do this.