paraphrase-MiniLM-L6-v2
Property | Value |
---|---|
Parameter Count | 22.7M |
License | Apache 2.0 |
Paper | Sentence-BERT Paper |
Downloads | 5.9M+ |
What is paraphrase-MiniLM-L6-v2?
paraphrase-MiniLM-L6-v2 is a compact yet powerful sentence embedding model developed by sentence-transformers. It efficiently maps sentences and paragraphs into 384-dimensional dense vector representations, making it ideal for tasks like semantic search, clustering, and similarity comparison.
Implementation Details
The model is built on a transformer architecture with a two-component structure: a transformer encoder followed by a pooling layer. It supports multiple frameworks including PyTorch, TensorFlow, and ONNX, making it highly versatile for different deployment scenarios.
- 384-dimensional output embeddings
- Maximum sequence length of 128 tokens
- Efficient mean pooling strategy
- Supports both sentence-transformers and HuggingFace implementations
Core Capabilities
- Sentence and paragraph embedding generation
- Semantic similarity computation
- Text clustering
- Cross-lingual capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its excellent balance between size and performance, using only 22.7M parameters while delivering high-quality embeddings. Its widespread adoption (5.9M+ downloads) demonstrates its reliability in production environments.
Q: What are the recommended use cases?
The model excels in semantic search applications, document similarity comparison, clustering text documents, and building text retrieval systems. It's particularly suitable for applications requiring efficient text representation while maintaining reasonable computational requirements.