What does HackerNews think of diffusers?
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
from diffusers import StableDiffusionPipeline
from torch import autocast
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt).images[0]
[1] https://github.com/huggingface/diffusersIt's true you have to use the code from git rather than a release, but that's not hard.
https://github.com/nlothian/m1_huggingface_diffusers_demo is my clean demo repo with a notebook showing the usage. The standard HuggingFace examples (eg for img2img[2]) port across with no trouble too.
[1] https://github.com/huggingface/diffusers
[2] https://github.com/huggingface/diffusers#image-to-image-text...
It's well engineered, maintainable and with decent installation process.
The branch in this PR[2] adds M1 Mac support with a one line patch and it runs faster than the CompVis version (1.5 iterations/sec vs 1.4 for CompVis on a 32 Gb M1 Max,
I highly recommend people switching to that version for the improved flexibility.
Here's the code:
from diffusers import DiffusionPipeline
ldm = DiffusionPipeline.from_pretrained(
"CompVis/ldm-text2im-large-256"
)
output = ldm(
["a painting of a raccoon reading a book"],
num_inference_steps=50,
eta=0.3,
guidance_scale=6
)
output["sample"][0].save("image.png")
This took 5m25s to run on my laptop (no GPU configured) and produced a recognisable image of a Raccoon reading a book. I tweeted the resulting image here: https://twitter.com/simonw/status/1550143524179288064