What does HackerNews think of diffusers?

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language: Python

#14 in Deep learning
You can 100% run this on an 8gb card. Make sure to load the model weights in half precision (and optionally enable attention slicing). E.g., using the diffusers library [1]:

    from diffusers import StableDiffusionPipeline
    from torch import autocast

    pipe = StableDiffusionPipeline.from_pretrained(
        "CompVis/stable-diffusion-v1-4", 
        revision="fp16", 
        torch_dtype=torch.float16,
        use_auth_token=True
    )
    pipe = pipe.to("cuda")
    pipe.enable_attention_slicing()
    prompt = "a photo of an astronaut riding a horse on mars"
    with autocast("cuda"):
        image = pipe(prompt).images[0]

[1] https://github.com/huggingface/diffusers
I don't understand why all these crazy forks don't switch to using the HuggingFace codebase[1]. It's much better code and easier to add features to.

It's true you have to use the code from git rather than a release, but that's not hard.

https://github.com/nlothian/m1_huggingface_diffusers_demo is my clean demo repo with a notebook showing the usage. The standard HuggingFace examples (eg for img2img[2]) port across with no trouble too.

[1] https://github.com/huggingface/diffusers

[2] https://github.com/huggingface/diffusers#image-to-image-text...

Support for ONNX export was just added to diffusers, but no runtime logic for scheduling yet.

https://github.com/huggingface/diffusers

I've been using the HuggingFace diffuses repo[1] with 6GB of VRAM fine.

It's well engineered, maintainable and with decent installation process.

The branch in this PR[2] adds M1 Mac support with a one line patch and it runs faster than the CompVis version (1.5 iterations/sec vs 1.4 for CompVis on a 32 Gb M1 Max,

I highly recommend people switching to that version for the improved flexibility.

[1] https://github.com/huggingface/diffusers

[2] https://github.com/huggingface/diffusers/pull/278

I tried doing this on my own laptop a few weeks ago using Hugging Face diffusers - https://github.com/huggingface/diffusers

Here's the code:

    from diffusers import DiffusionPipeline
    ldm = DiffusionPipeline.from_pretrained(
        "CompVis/ldm-text2im-large-256"
    )
    output = ldm(
        ["a painting of a raccoon reading a book"],
        num_inference_steps=50,
        eta=0.3,
        guidance_scale=6
    )
    output["sample"][0].save("image.png")
This took 5m25s to run on my laptop (no GPU configured) and produced a recognisable image of a Raccoon reading a book. I tweeted the resulting image here: https://twitter.com/simonw/status/1550143524179288064