BGE-Large-En-v1.5
Property | Value |
---|---|
Parameter Count | 335M |
License | MIT |
Primary Task | Feature Extraction & Embedding |
Language | English |
What is bge-large-en-v1.5?
BGE-Large-En-v1.5 is an advanced English language embedding model developed by BAAI that represents the latest iteration in the BGE (BAAI General Embedding) series. This version 1.5 introduces improvements in similarity distribution and enhanced retrieval capabilities, even without instruction prompts. The model achieves state-of-the-art performance on the MTEB benchmark, scoring an impressive 64.23 average across 56 datasets.
Implementation Details
The model utilizes a transformer-based architecture with 335M parameters, optimized for generating high-quality text embeddings. It supports a sequence length of 512 tokens and produces embeddings with 1024 dimensions. The model can be easily integrated using popular frameworks like HuggingFace Transformers, Sentence-Transformers, or FlagEmbedding.
- Achieves 54.29% performance on Retrieval tasks
- Supports both symmetric and asymmetric similarity calculations
- Includes optional query instruction enhancement
- Optimized for both CPU and GPU inference
Core Capabilities
- State-of-the-art performance in text embedding generation
- Excels in retrieval tasks with 54.29% accuracy
- Strong performance in clustering (46.08%) and pair classification (87.12%)
- Supports efficient similarity search and document retrieval
- Enhanced performance in semantic textual similarity (STS) tasks (83.11%)
Frequently Asked Questions
Q: What makes this model unique?
The v1.5 version specifically addresses previous limitations in similarity distribution and enhances retrieval capabilities without requiring explicit instructions. It achieves top performance on the MTEB benchmark while maintaining efficient inference times.
Q: What are the recommended use cases?
The model excels in document retrieval, semantic search, clustering, and similarity comparison tasks. It's particularly well-suited for building search systems, document classification, and semantic analysis applications.