potion-base-32M
Property | Value |
---|---|
Author | minishlab |
Model Type | Static Embedding Model |
Base Model | baai/bge-base-en-v1.5 |
Repository | HuggingFace |
What is potion-base-32M?
potion-base-32M is a highly optimized static embedding model created using Model2Vec technology. It represents a significant advancement in efficient text embedding generation, being distilled from the powerful baai/bge-base-en-v1.5 Sentence Transformer. The model is specifically designed for scenarios requiring fast computation and limited resources while maintaining strong performance.
Implementation Details
The model employs a sophisticated training pipeline that includes distillation, training data creation using mean output embeddings, Tokenlearn-based training, and post-training re-regularization. This process involves token frequency weighting, PCA application, and SIF weighting to optimize performance.
- Static embeddings for ultra-fast computation
- 32M vocabulary size for comprehensive language coverage
- Optimized for both CPU and GPU deployment
- Easy integration via model2vec library
Core Capabilities
- Classification Performance: 65.97%
- Semantic Textual Similarity (STS): 74.22%
- Pair Classification: 78.17%
- Average MTEB Performance: 51.66%
Frequently Asked Questions
Q: What makes this model unique?
The model combines static embeddings with state-of-the-art performance, making it particularly valuable for real-time applications where computational efficiency is crucial. It offers a larger vocabulary size compared to its 8M counterpart while maintaining fast processing speeds.
Q: What are the recommended use cases?
This model is ideal for applications requiring real-time text embedding generation, especially in resource-constrained environments. It excels in tasks like semantic similarity comparison, classification, and text retrieval where speed is critical.