all-mpnet-base-v2

Maintained By
sentence-transformers

all-mpnet-base-v2

PropertyValue
Parameter Count109M
LicenseApache 2.0
Vector Dimension768
Training Data1B+ sentence pairs
Authorsentence-transformers

What is all-mpnet-base-v2?

all-mpnet-base-v2 is a state-of-the-art sentence embedding model that maps sentences and paragraphs to dense 768-dimensional vector representations. Built on Microsoft's MPNet architecture and fine-tuned on over 1 billion sentence pairs, this model excels at capturing semantic meaning for downstream tasks like clustering and semantic search.

Implementation Details

The model builds upon the microsoft/mpnet-base foundation and employs a contrastive learning approach during fine-tuning. It processes text sequences up to 384 tokens and was trained using AdamW optimizer with a 2e-5 learning rate across 100k steps with a batch size of 1024 on TPU v3-8 hardware.

  • Trained on 21 diverse datasets including Reddit comments, research papers, and question-answer pairs
  • Implements efficient mean pooling for generating sentence embeddings
  • Supports both PyTorch and ONNX runtime environments
  • Achieves state-of-the-art performance on sentence similarity tasks

Core Capabilities

  • Semantic text similarity computation
  • Information retrieval and clustering
  • Zero-shot text classification
  • Cross-lingual alignment
  • Document similarity analysis

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its massive training dataset of over 1 billion sentence pairs and its optimization for sentence-level semantic understanding. The combination of MPNet architecture with contrastive learning produces highly effective sentence embeddings.

Q: What are the recommended use cases?

The model excels at tasks requiring semantic understanding, including document similarity comparison, semantic search systems, clustering textual data, and building recommendation systems. It's particularly effective for applications needing accurate sentence-level embeddings.

The first platform built for prompt engineering