Infinity

Maintained By
FoundationVision

Infinity

PropertyValue
AuthorFoundationVision
PaperarXiv:2412.04431
Model RepositoryHugging Face

What is Infinity?

Infinity represents a revolutionary approach to image synthesis through Bitwise Visual AutoRegressive Modeling. It introduces an infinite-vocabulary tokenizer and classifier system, coupled with bitwise self-correction capabilities, setting new standards in high-resolution image generation.

Implementation Details

The model employs a sophisticated bitwise token prediction framework that theoretically scales the tokenizer vocabulary size to infinity while concurrently scaling the transformer architecture. This innovative approach has enabled significant improvements in both quality and speed compared to existing solutions.

  • Infinite-vocabulary tokenizer & classifier system
  • Bitwise self-correction mechanism
  • Scaled transformer architecture
  • 0.8-second generation time for 1024×1024 images

Core Capabilities

  • Outperforms SD3-Medium and SDXL in quality metrics
  • Achieves 0.73 on GenEval benchmark (vs 0.62 for SD3-Medium)
  • Scores 0.96 on ImageReward benchmark (vs 0.87 for SD3-Medium)
  • 66% win rate in comparative evaluations
  • 2.6× faster than SD3-Medium for high-resolution image generation

Frequently Asked Questions

Q: What makes this model unique?

Infinity's uniqueness lies in its bitwise autoregressive approach and infinite-vocabulary tokenizer, enabling unprecedented scaling capabilities while maintaining extremely fast generation speeds. It represents a fundamental shift from traditional diffusion-based approaches.

Q: What are the recommended use cases?

The model excels at high-resolution image synthesis tasks, particularly where both quality and speed are crucial. It's especially suitable for applications requiring 1024×1024 image generation with photorealistic results.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.