simlm-msmarco-reranker

Maintained By
intfloat

SimLM MS MARCO Reranker

PropertyValue
LicenseMIT
PaperView Paper
LanguageEnglish
FrameworkPyTorch

What is simlm-msmarco-reranker?

SimLM is an innovative pre-training method for dense passage retrieval that employs a bottleneck architecture to compress passage information into dense vectors. Developed by Microsoft researchers, this model achieves impressive results on the MS-MARCO passage ranking task, with a dev MRR@10 of 43.8 and outperforming more complex multi-vector approaches.

Implementation Details

The model utilizes a replaced language modeling objective inspired by ELECTRA, improving sample efficiency and reducing the pre-training/fine-tuning distribution mismatch. It's implemented using the Transformers library and can process query-passage pairs with optional titles, producing relevance scores through a listwise loss training approach.

  • Maximum sequence length of 192 tokens
  • Supports batch processing for efficient inference
  • Implements sequence classification architecture
  • Uses ELECTRA-based architecture for better efficiency

Core Capabilities

  • Dense passage retrieval with state-of-the-art performance
  • Effective document reranking for search applications
  • High recall rates (98.6% R@1k on MS-MARCO dev set)
  • Strong performance on TREC DL tasks (74.6 nDCG@10 on TREC DL 2019)

Frequently Asked Questions

Q: What makes this model unique?

SimLM's uniqueness lies in its bottleneck architecture and self-supervised pre-training approach, which doesn't require labeled data or queries. It achieves superior performance while being more efficient than multi-vector approaches like ColBERTv2.

Q: What are the recommended use cases?

The model is ideal for information retrieval systems, search engine reranking, and document retrieval applications where high-quality passage ranking is required. It's particularly effective for scenarios requiring strong performance without extensive labeled data.

The first platform built for prompt engineering