BGE-Large-En-v1.5

Property	Value
Parameter Count	335M
License	MIT
Primary Task	Feature Extraction & Embedding
Language	English

What is bge-large-en-v1.5?

BGE-Large-En-v1.5 is an advanced English language embedding model developed by BAAI that represents the latest iteration in the BGE (BAAI General Embedding) series. This version 1.5 introduces improvements in similarity distribution and enhanced retrieval capabilities, even without instruction prompts. The model achieves state-of-the-art performance on the MTEB benchmark, scoring an impressive 64.23 average across 56 datasets.

Implementation Details

The model utilizes a transformer-based architecture with 335M parameters, optimized for generating high-quality text embeddings. It supports a sequence length of 512 tokens and produces embeddings with 1024 dimensions. The model can be easily integrated using popular frameworks like HuggingFace Transformers, Sentence-Transformers, or FlagEmbedding.

Achieves 54.29% performance on Retrieval tasks
Supports both symmetric and asymmetric similarity calculations
Includes optional query instruction enhancement
Optimized for both CPU and GPU inference

Core Capabilities

State-of-the-art performance in text embedding generation
Excels in retrieval tasks with 54.29% accuracy
Strong performance in clustering (46.08%) and pair classification (87.12%)
Supports efficient similarity search and document retrieval
Enhanced performance in semantic textual similarity (STS) tasks (83.11%)

Frequently Asked Questions

Q: What makes this model unique?

The v1.5 version specifically addresses previous limitations in similarity distribution and enhances retrieval capabilities without requiring explicit instructions. It achieves top performance on the MTEB benchmark while maintaining efficient inference times.

Q: What are the recommended use cases?

The model excels in document retrieval, semantic search, clustering, and similarity comparison tasks. It's particularly well-suited for building search systems, document classification, and semantic analysis applications.

bge-large-en-v1.5