stable-diffusion-v1-5

Maintained By
stable-diffusion-v1-5

Stable Diffusion v1.5

PropertyValue
LicenseCreativeML OpenRAIL-M
AuthorsRobin Rombach, Patrick Esser
Training DataLAION-2B Dataset
PaperHigh-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022)

What is stable-diffusion-v1-5?

Stable Diffusion v1.5 is a state-of-the-art latent diffusion model designed for text-to-image generation. It builds upon v1.2 and underwent extensive training with 595,000 steps at 512x512 resolution on the "laion-aesthetics v2 5+" dataset. The model incorporates a 10% text-conditioning dropout to enhance classifier-free guidance sampling.

Implementation Details

The model combines an autoencoder with a diffusion model trained in latent space. It utilizes a ViT-L/14 text encoder and features a UNet backbone for the latent diffusion process. Training was conducted on 32 x 8 x A100 GPUs with a batch size of 2048 and AdamW optimizer.

  • Relative downsampling factor of 8 for latent representations
  • Non-pooled text encoder output fed via cross-attention
  • Warmup learning rate to 0.0001 for 10,000 steps

Core Capabilities

  • High-quality text-to-image generation at 512x512 resolution
  • Advanced aesthetics handling through specialized dataset filtering
  • Improved classifier-free guidance sampling
  • Support for various frameworks including Diffusers, ComfyUI, and Automatic1111

Frequently Asked Questions

Q: What makes this model unique?

This model represents a significant improvement over previous versions with its extensive training (595k steps) and specialized dataset focusing on aesthetic quality. It includes improved classifier-free guidance sampling and maintains broad compatibility with various frameworks.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and research on generative models. It explicitly excludes harmful content generation and maintains strong ethical guidelines.

The first platform built for prompt engineering