roberta-base

Maintained By
FacebookAI

RoBERTa Base Model

PropertyValue
Parameter Count125M
LicenseMIT
AuthorFacebookAI
PaperLink to paper
Training DataBookCorpus, Wikipedia, CC-News, OpenWebText, Stories

What is roberta-base?

RoBERTa-base is a transformer-based language model developed by FacebookAI as an optimized version of BERT. It was trained on a massive 160GB text corpus using masked language modeling, making it particularly effective for various NLP tasks. The model employs a byte-level BPE tokenizer with a 50,000 token vocabulary and can process sequences up to 512 tokens long.

Implementation Details

The model was trained using 1024 V100 GPUs for 500K steps with a batch size of 8K. It implements dynamic masking during pretraining, where 15% of tokens are masked in varying ways: 80% replaced with token, 10% with random tokens, and 10% left unchanged.

  • Architecture: Bidirectional transformer encoder
  • Training Optimization: Adam optimizer with learning rate 6e-4
  • Preprocessing: Byte-Pair Encoding with 50K vocabulary
  • Case-sensitive: Distinguishes between different cases

Core Capabilities

  • Masked Language Modeling
  • Feature Extraction for Downstream Tasks
  • Sequence Classification
  • Token Classification
  • Question Answering

Frequently Asked Questions

Q: What makes this model unique?

RoBERTa's key distinction lies in its robust optimization of the BERT architecture, utilizing dynamic masking and larger training data. It achieves state-of-the-art performance on GLUE benchmarks, with notably high scores on MNLI (87.6) and QNLI (92.8).

Q: What are the recommended use cases?

The model is best suited for tasks that require understanding of complete sentences, including text classification, named entity recognition, and question answering. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.

The first platform built for prompt engineering