Stable Diffusion v1.5
Property | Value |
---|---|
License | CreativeML OpenRAIL-M |
Authors | Robin Rombach, Patrick Esser |
Training Data | LAION-2B Dataset |
Paper | High-Resolution Image Synthesis With Latent Diffusion Models (CVPR 2022) |
What is stable-diffusion-v1-5?
Stable Diffusion v1.5 is a state-of-the-art latent diffusion model designed for text-to-image generation. It builds upon v1.2 and underwent extensive training with 595,000 steps at 512x512 resolution on the "laion-aesthetics v2 5+" dataset. The model incorporates a 10% text-conditioning dropout to enhance classifier-free guidance sampling.
Implementation Details
The model combines an autoencoder with a diffusion model trained in latent space. It utilizes a ViT-L/14 text encoder and features a UNet backbone for the latent diffusion process. Training was conducted on 32 x 8 x A100 GPUs with a batch size of 2048 and AdamW optimizer.
- Relative downsampling factor of 8 for latent representations
- Non-pooled text encoder output fed via cross-attention
- Warmup learning rate to 0.0001 for 10,000 steps
Core Capabilities
- High-quality text-to-image generation at 512x512 resolution
- Advanced aesthetics handling through specialized dataset filtering
- Improved classifier-free guidance sampling
- Support for various frameworks including Diffusers, ComfyUI, and Automatic1111
Frequently Asked Questions
Q: What makes this model unique?
This model represents a significant improvement over previous versions with its extensive training (595k steps) and specialized dataset focusing on aesthetic quality. It includes improved classifier-free guidance sampling and maintains broad compatibility with various frameworks.
Q: What are the recommended use cases?
The model is primarily intended for research purposes, including safe deployment studies, artistic applications, educational tools, and research on generative models. It explicitly excludes harmful content generation and maintains strong ethical guidelines.