BGE Base English v1.5

Property	Value
Parameter Count	109M
Model Type	Feature Extraction / Embedding
License	MIT
Primary Language	English

What is bge-base-en-v1.5?

BGE Base English v1.5 is a powerful embedding model developed by BAAI that achieves impressive performance on the MTEB benchmark. It's designed specifically for text embeddings and semantic search, featuring improvements in similarity distribution and enhanced retrieval capabilities compared to previous versions. With 109M parameters, it offers an excellent balance between model size and performance.

Implementation Details

The model uses a BERT-based architecture optimized for generating text embeddings. It supports multiple deployment options including FlagEmbedding, Sentence-Transformers, Langchain, and Hugging Face Transformers. A key feature is its ability to handle both short queries and long passages effectively, with optional query instructions for enhanced retrieval performance.

Normalized embeddings for cosine similarity computation
Support for both CPU and GPU inference
Maximum sequence length of 512 tokens
Optimized version 1.5 with improved similarity distribution

Core Capabilities

Strong performance on MTEB benchmark with 63.55 average score
Excellent retrieval capabilities (53.25 on retrieval tasks)
Robust clustering performance (45.77 score)
High accuracy on pair classification tasks (86.55)

Frequently Asked Questions

Q: What makes this model unique?

The model offers a superior balance between size and performance, with Version 1.5 specifically addressing similarity distribution issues and improving retrieval performance without requiring explicit instructions.

Q: What are the recommended use cases?

It's ideal for semantic search, document retrieval, text similarity comparison, and clustering applications. The model performs particularly well in short query to long passage retrieval scenarios.

bge-base-en-v1.5