What does HackerNews think of stable-diffusion-webui?

Stable Diffusion web UI

Language: Python

I have the same experience with Invoke.ai or MochiDiffusion in the MBP M1. I can only match the quality of other images with Automatic1111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui).

You’ll need more time and memory compared to Invoke or an Nvidia graphics card, but it’s not that bad: 1-2 s/it for an image in standard 512x768px quality, 14-20 s/it for an image in high 1024x1536px quality (Hires Fix).

Huh. You inspired me to finally get around to installing a local Midjourney-like[0], and, yeah, what do you know - the first result I got from those prompts _without_ the `-amputee` negative prompt was this horrifying (though SFW, despite what Imgur claims) monstrosity[1]

[0] https://github.com/AUTOMATIC1111/stable-diffusion-webui

[1] https://imgur.com/a/bMp5aAB

If you've got an nvidia graphics card with 8GB of vram made in the last decade, AUTOMATIC1111's stable diffusion web ui [1] will crank out a few thousand images every 24 hours. Depending on settings and how fast your card is, naturally.

And there's a large ecosystem of downloadable models available online for specific looks and concepts, like models trained for photorealism.

[1] https://github.com/AUTOMATIC1111/stable-diffusion-webui

I was mostly referring to this project, which has some 1-click installers: https://github.com/oobabooga/text-generation-webui#alternati...

Though I have not tried those 1-click installers, instead I have been manually running it.

That project is based on the concept of this Stable Diffusion project: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Which is a few months ahead (because the Stable Diffusion tech happened a few months earlier) and is definitely at a point where anyone can easily run it, locally or on a hosted environment.

I expect this "text-generation-webui" (or something like it) will be just as easy to use in the near, near future.

I think implying that GPL is not "fully open source" is a hot take. It's specifically designed to ensure you and anyone you distribute your code gets the same freedoms. Maybe you don't agree that it's a good license but that is its intention. GPL vs BSD-type licenses I guess is decades long argument by now.

Maybe I'm a naive idealist but IMO the GPL-family of licenses are underrated. You can use them to make sure you don't work for free for someone who won't share their improvements.

I liked the choice of AGPL for AUTOMATIC1111 Stable Diffusion web UI. (https://github.com/AUTOMATIC1111/stable-diffusion-webui)

Commercial interests are very allergic to AGPL which ensures the project stays community-run and new features and fixes will prioritize the most ordinary user doing things for fun.

One thing I think will be different and that had totally escaped my radar until recently is just the enormous and diverse community that has been developing around Stable Diffusion, which I think will be less likely to form with language models.

I just recently tried out one of the most popular [0] Stable Diffusion WebUIs locally, and I'm positively surprised at how different it is to the rest of the space around ML research/computing. I consider myself to be a competent software engineer, but I still often find it pretty tricky to get e.g. HuggingFace models running and doing what I envision them to do. SpeechT5 for instance is reported to do voice transformations, but it took me a good bit of time and hair-pulling to figure out how to extract voice embeddings from .wav files. I'm sure the way to do this is obvious to most researchers, maybe to the point of feeling like this needs not a mention in the documentation, but it certainly wasn't clear to me.

The community around Stable Diffusion is much more inclusive, though. Tools go the extra effort to be easy to use, and documentation for community created models/scripts/tools is so accessible as to be perfectly usable by a non-technical user who is willing to adventure a little bit into the world of hardcore computing by following instructions. Sure, nothing is too polished and you often get the feeling that it's "an ugly thing, but an ugly thing that works", but the point is that it's incredibly accessible. People get to actually use these models to build their stories, fantasy worlds, to work, and things get progressively more impressive as the community builds upon itself (I loved the style of [1] and even effortlessly merged its style with another one in the WebUI, and ControlNet [2] is amazing and gives me ideas for integrating my photography with AI).

I think the general interest in creating images is larger than for LLMs with their current limitations (especially in current consumer-available hardware). I do wonder how much this community interest will boost the spaces in the longer run, but right now I can't help but be impressed by the difference in usability and collaborative development between image generative and other types of models.

[0] https://github.com/AUTOMATIC1111/stable-diffusion-webui

[1] https://civitai.com/models/4998/vivid-watercolors

[2] https://github.com/Mikubill/sd-webui-controlnet

I feel like you can compensate with more complicated prompts. Or even different prompt categories (like negative prompts, but for programming it might be a list of constraints). Like this interface: https://github.com/AUTOMATIC1111/stable-diffusion-webui but for code
It's gotten much easier in the 24 hours because of this binary release of a popular stable diffusion setup+UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui/rele...

(you still need a Nvidia GPU)

Extract the zip file and run the batch file. Find the cptk (checkpoint) file for a model you want. You can find openjourney here: https://huggingface.co/openjourney/openjourney/tree/main. Add it to the model directory.

Then you just need to go to a web browser and you can use the AUTOMATIC1111 webui. More information here: https://github.com/AUTOMATIC1111/stable-diffusion-webui

If you don't care what exact tool in particular, https://github.com/AUTOMATIC1111/stable-diffusion-webui is the easiest to install I think and gives lots of toys to play with (txt2img, img2img, teach it your likeness, etc.)
You can try any img2img Stable Diffusion tool.

This web UI is nice: https://github.com/AUTOMATIC1111/stable-diffusion-webui/

There are fairly stable web UIs now, and you can be up and running in about five minutes: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Tweaking and installing new models will take some additional effort, but there has been a veritable explosion in freely available resources, eg. https://rentry.co/sdupdates3

Also it runs fine on mid-range consumer GPUs, just with limited batch sizes.

I've seen references to merging models together to be able to generate new kinds of imagery or styles, how does that work? I think you use Dreambooth to make specialized models, and I think I got an idea about how that basically assigns a name to a vector in the latent space that represents the thing you want to generate new imagery of, but can you generate multiple models and blend them together?

Edit: Looks like AUTOMATIC1111 can merge three checkpoints. I still don't know how it works technically, but I guess that's how it's done?

https://github.com/AUTOMATIC1111/stable-diffusion-webui

If you're reading this and you haven't played with Stable Diffusion yet, try any of the following quick-and-easy generators:

https://stablediffusionweb.com

https://getimg.ai/editor

https://healthydiffusion.com

https://holovolo.tv/landing

Not sure what to type? Copy and paste prompts from any of the following prompt databases:

https://lexica.art

https://www.krea.ai/

https://promptbase.com

Want to run it on your own computer? Install any of the following:

https://github.com/AUTOMATIC1111/stable-diffusion-webui

https://github.com/brycedrennan/imaginAIry

For the image generation (or even indexing with the CLIP interrogator) side of things, recommend just installing the AUTOMATIC1111 github repo (https://github.com/AUTOMATIC1111/stable-diffusion-webui), it's a web ui with pretty much every variant of stable diffusion you could want to try out, like txt2img, img2img, inpainting (both textual inversion and dream booth), outpainting, style customization, clip interrogation, etc. Most importantly, there are about 1000 youtube tutorials on how to do each of these things with it, so you can pick your interest areas and just try it out without having to understand all the details first.

From there, if you're interested in how it works, I highly recommend the last 4 videos on Jeremy Howard's youtube channel: https://www.youtube.com/user/howardjeremyp/videos

He's currently teaching a class on stable diffusion from the ground up and these lectures give a really good introduction to how it all works.

There's a cool technique called "negative" prompt that I didn't see mentioned. Some of the community projects have implemented it. (I use https://github.com/AUTOMATIC1111/stable-diffusion-webui locally when I want to generate something).

You pass in both your prompt and a negative prompt at the same time. The negative prompt describes what you don't want in your image and it'll try its best. There's some magic quality words like "jpeg artifacts" you can put on negative prompt and poof the image is less messy now.

If you have a reasonable video card, you can run it easily locally using this repo:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/

It is extremely active - author updates it 10-20 times per day.

This repo SD AUTOMATIC111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui/) is not official/ community run, has a few weeks of history, and has 14k which I think are legit.

Lots of user on github maybe? Network effects, more feed recommendations, etc

Github is becoming FB

I'm using the one I linked in my original post: https://github.com/AUTOMATIC1111/stable-diffusion-webui

The only command line argument I'm using is --lowvram, and usually generate pictures at the default settings at 512x512 image size.

You can see all the command line arguments and what they do here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...

> This is by far the most popular and active right now: https://github.com/AUTOMATIC1111/stable-diffusion-webui

While technically the most popular, I wouldn't call it "by far". This one is a very close second (500 vs 580 forks): https://github.com/sd-webui/stable-diffusion-webui/tree/dev

Nope. There are instructions for Windows, Linux and Apple Silicon in the readme: https://github.com/AUTOMATIC1111/stable-diffusion-webui

There's also this fork of AUTOMATIC1111's fork, which also has a Colab notebook ready to run, and it's way, way faster than the KerasCV version: https://github.com/TheLastBen/fast-stable-diffusion

(It also has many, many more options and some nice, user-friendly GUIs. It's the best version for Google Colab!)

Here are my best attempts: https://imgur.com/a/obZH7X5

Not a very wide range of what I could do with the idea in terms of composition, but just some variations of finishing touches/intermediate steps. I achieved this with some human-in-the-loop iteration and inpainting, but it was no more than 15-30 minutes toying around with it, and I'm no artist.

If you have a semi-decent graphics card and would like to experiment with a bunch of extra settings and tools than are readily available online, this is a good repo for that: https://github.com/AUTOMATIC1111/stable-diffusion-webui