All the examples are portraits of people.
I have to wonder whether it works well with anything else.
trained on celebA, so no, but you could for sure train this on a more varied dataset
Would it be as simple as feeding it a bunch of decolorized images along with the originals?
yes, so infinite training data. but the challenge will be scaling to large resolutions and getting global consistency
I guess you can always use a two-stage process. First colorize, then upscale
yeah, you can use SOTA super res, but that tends to be generative too (even diffusion based on its own, or more commonly based on GANs). it can be a challenge to synthesize the right high res details.
but that’s basically the stable diffusion paper (diffusion in latent space plus GAN superres)
https://github.com/TencentARC/T2I-Adapter
i've also seen a controlnet do this.