You’ll need more time and memory compared to Invoke or an Nvidia graphics card, but it’s not that bad: 1-2 s/it for an image in standard 512x768px quality, 14-20 s/it for an image in high 1024x1536px quality (Hires Fix).
And there's a large ecosystem of downloadable models available online for specific looks and concepts, like models trained for photorealism.
AUTOMATIC1111
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion
https://github.com/Stability-AI/StableDiffusion
?
Though I have not tried those 1-click installers, instead I have been manually running it.
That project is based on the concept of this Stable Diffusion project: https://github.com/AUTOMATIC1111/stable-diffusion-webui
Which is a few months ahead (because the Stable Diffusion tech happened a few months earlier) and is definitely at a point where anyone can easily run it, locally or on a hosted environment.
I expect this "text-generation-webui" (or something like it) will be just as easy to use in the near, near future.
Maybe I'm a naive idealist but IMO the GPL-family of licenses are underrated. You can use them to make sure you don't work for free for someone who won't share their improvements.
I liked the choice of AGPL for AUTOMATIC1111 Stable Diffusion web UI. (https://github.com/AUTOMATIC1111/stable-diffusion-webui)
Commercial interests are very allergic to AGPL which ensures the project stays community-run and new features and fixes will prioritize the most ordinary user doing things for fun.
I just recently tried out one of the most popular [0] Stable Diffusion WebUIs locally, and I'm positively surprised at how different it is to the rest of the space around ML research/computing. I consider myself to be a competent software engineer, but I still often find it pretty tricky to get e.g. HuggingFace models running and doing what I envision them to do. SpeechT5 for instance is reported to do voice transformations, but it took me a good bit of time and hair-pulling to figure out how to extract voice embeddings from .wav files. I'm sure the way to do this is obvious to most researchers, maybe to the point of feeling like this needs not a mention in the documentation, but it certainly wasn't clear to me.
The community around Stable Diffusion is much more inclusive, though. Tools go the extra effort to be easy to use, and documentation for community created models/scripts/tools is so accessible as to be perfectly usable by a non-technical user who is willing to adventure a little bit into the world of hardcore computing by following instructions. Sure, nothing is too polished and you often get the feeling that it's "an ugly thing, but an ugly thing that works", but the point is that it's incredibly accessible. People get to actually use these models to build their stories, fantasy worlds, to work, and things get progressively more impressive as the community builds upon itself (I loved the style of [1] and even effortlessly merged its style with another one in the WebUI, and ControlNet [2] is amazing and gives me ideas for integrating my photography with AI).
I think the general interest in creating images is larger than for LLMs with their current limitations (especially in current consumer-available hardware). I do wonder how much this community interest will boost the spaces in the longer run, but right now I can't help but be impressed by the difference in usability and collaborative development between image generative and other types of models.
[0] https://github.com/AUTOMATIC1111/stable-diffusion-webui
https://github.com/AUTOMATIC1111/stable-diffusion-webui
And then this https://www.reddit.com/r/StableDiffusion/comments/1167j0a/a_...
Step by step video tutorial https://youtu.be/vhqqmkTBMlU
(you still need a Nvidia GPU)
Extract the zip file and run the batch file. Find the cptk (checkpoint) file for a model you want. You can find openjourney here: https://huggingface.co/openjourney/openjourney/tree/main. Add it to the model directory.
Then you just need to go to a web browser and you can use the AUTOMATIC1111 webui. More information here: https://github.com/AUTOMATIC1111/stable-diffusion-webui
This web UI is nice: https://github.com/AUTOMATIC1111/stable-diffusion-webui/
https://github.com/AUTOMATIC1111/stable-diffusion-webui/ - At least I could access it now.
Tweaking and installing new models will take some additional effort, but there has been a veritable explosion in freely available resources, eg. https://rentry.co/sdupdates3
Also it runs fine on mid-range consumer GPUs, just with limited batch sizes.
https://github.com/AUTOMATIC1111/stable-diffusion-webui
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...
Edit: Looks like AUTOMATIC1111 can merge three checkpoints. I still don't know how it works technically, but I guess that's how it's done?
https://stablediffusionweb.com
Not sure what to type? Copy and paste prompts from any of the following prompt databases:
Want to run it on your own computer? Install any of the following:
From there, if you're interested in how it works, I highly recommend the last 4 videos on Jeremy Howard's youtube channel: https://www.youtube.com/user/howardjeremyp/videos
He's currently teaching a class on stable diffusion from the ground up and these lectures give a really good introduction to how it all works.
You pass in both your prompt and a negative prompt at the same time. The negative prompt describes what you don't want in your image and it'll try its best. There's some magic quality words like "jpeg artifacts" you can put on negative prompt and poof the image is less messy now.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/
It is extremely active - author updates it 10-20 times per day.
Lots of user on github maybe? Network effects, more feed recommendations, etc
Github is becoming FB
The only command line argument I'm using is --lowvram, and usually generate pictures at the default settings at 512x512 image size.
You can see all the command line arguments and what they do here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...
While technically the most popular, I wouldn't call it "by far". This one is a very close second (500 vs 580 forks): https://github.com/sd-webui/stable-diffusion-webui/tree/dev
There's also this fork of AUTOMATIC1111's fork, which also has a Colab notebook ready to run, and it's way, way faster than the KerasCV version: https://github.com/TheLastBen/fast-stable-diffusion
(It also has many, many more options and some nice, user-friendly GUIs. It's the best version for Google Colab!)
Not a very wide range of what I could do with the idea in terms of composition, but just some variations of finishing touches/intermediate steps. I achieved this with some human-in-the-loop iteration and inpainting, but it was no more than 15-30 minutes toying around with it, and I'm no artist.
If you have a semi-decent graphics card and would like to experiment with a bunch of extra settings and tools than are readily available online, this is a good repo for that: https://github.com/AUTOMATIC1111/stable-diffusion-webui