> train this from scratch

If you're talking about training from scratch and not fine tuning, that won't be cheap or easy to do. You need thousands upon thousands of dollars of GPU compute [1] and a gigantic data set.

I trained something nowhere near the scale of Stable Diffusion on Lambda Labs, and my bill was $14,000.

[1] Assuming you rent GPUs hourly, because buying the hardware outright will be prohibitively expensive.

I have... ~11TBs of free disk space and a 1080ti. Obviously nowhere close to being able to crunch all of Wikimedia Commons, but I'm also not trying to beat Stability AI at their own game. I just want to move the arguments people have about art generators beyond "this is unethical copyright laundering" and "the model is taking reference just like a real human".

To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.