UltraRM-13b

Maintained By
openbmb

UltraRM-13b

PropertyValue
Base ModelLLaMA2-13B
LicenseMIT
PaperUltraFeedback Paper
FrameworkPyTorch, Transformers

What is UltraRM-13b?

UltraRM-13b is a state-of-the-art reward model developed by OpenBMB, built on the LLaMA2-13B architecture. It's trained on the UltraFeedback dataset along with a mixture of other high-quality feedback datasets, including Anthropic HH-RLHF, Stanford SHP, and Summarization feedback data. The model has demonstrated exceptional performance, achieving a 92.30% win rate against text-davinci-003 on the AlpacaEval benchmark.

Implementation Details

The model implements a regression head on top of the LLaMA architecture to provide reward scores for text completions. It's designed to evaluate the quality of AI-generated responses and can be easily integrated into reinforcement learning pipelines.

  • Built on LLaMA2-13B architecture
  • Trained on UltraFeedback and multiple high-quality feedback datasets
  • Implements custom reward modeling architecture
  • Provides scalar reward scores for text evaluation

Core Capabilities

  • State-of-the-art performance in preference evaluation
  • Effective text quality assessment
  • Compatible with standard transformers pipeline
  • Supports both direct reward computation and comparative evaluation

Frequently Asked Questions

Q: What makes this model unique?

UltraRM-13b stands out for its exceptional performance in reward modeling, achieved through training on a diverse set of high-quality feedback datasets. It sets new state-of-the-art benchmarks for open-source reward models and demonstrates superior capabilities in evaluating text quality.

Q: What are the recommended use cases?

The model is primarily designed for evaluating the quality of language model outputs, making it ideal for: reinforcement learning from human feedback (RLHF), quality assessment of generated text, and model comparison studies. It's particularly useful in research and development of better language models.

The first platform built for prompt engineering