SimLM MS MARCO Reranker
Property | Value |
---|---|
License | MIT |
Paper | View Paper |
Language | English |
Framework | PyTorch |
What is simlm-msmarco-reranker?
SimLM is an innovative pre-training method for dense passage retrieval that employs a bottleneck architecture to compress passage information into dense vectors. Developed by Microsoft researchers, this model achieves impressive results on the MS-MARCO passage ranking task, with a dev MRR@10 of 43.8 and outperforming more complex multi-vector approaches.
Implementation Details
The model utilizes a replaced language modeling objective inspired by ELECTRA, improving sample efficiency and reducing the pre-training/fine-tuning distribution mismatch. It's implemented using the Transformers library and can process query-passage pairs with optional titles, producing relevance scores through a listwise loss training approach.
- Maximum sequence length of 192 tokens
- Supports batch processing for efficient inference
- Implements sequence classification architecture
- Uses ELECTRA-based architecture for better efficiency
Core Capabilities
- Dense passage retrieval with state-of-the-art performance
- Effective document reranking for search applications
- High recall rates (98.6% R@1k on MS-MARCO dev set)
- Strong performance on TREC DL tasks (74.6 nDCG@10 on TREC DL 2019)
Frequently Asked Questions
Q: What makes this model unique?
SimLM's uniqueness lies in its bottleneck architecture and self-supervised pre-training approach, which doesn't require labeled data or queries. It achieves superior performance while being more efficient than multi-vector approaches like ColBERTv2.
Q: What are the recommended use cases?
The model is ideal for information retrieval systems, search engine reranking, and document retrieval applications where high-quality passage ranking is required. It's particularly effective for scenarios requiring strong performance without extensive labeled data.