RoBERTa Base Model

Property	Value
Parameter Count	125M
License	MIT
Author	FacebookAI
Paper	Link to paper
Training Data	BookCorpus, Wikipedia, CC-News, OpenWebText, Stories

What is roberta-base?

RoBERTa-base is a transformer-based language model developed by FacebookAI as an optimized version of BERT. It was trained on a massive 160GB text corpus using masked language modeling, making it particularly effective for various NLP tasks. The model employs a byte-level BPE tokenizer with a 50,000 token vocabulary and can process sequences up to 512 tokens long.

Implementation Details

The model was trained using 1024 V100 GPUs for 500K steps with a batch size of 8K. It implements dynamic masking during pretraining, where 15% of tokens are masked in varying ways: 80% replaced with token, 10% with random tokens, and 10% left unchanged.

Architecture: Bidirectional transformer encoder
Training Optimization: Adam optimizer with learning rate 6e-4
Preprocessing: Byte-Pair Encoding with 50K vocabulary
Case-sensitive: Distinguishes between different cases

Core Capabilities

Masked Language Modeling
Feature Extraction for Downstream Tasks
Sequence Classification
Token Classification
Question Answering

Frequently Asked Questions

Q: What makes this model unique?

RoBERTa's key distinction lies in its robust optimization of the BERT architecture, utilizing dynamic masking and larger training data. It achieves state-of-the-art performance on GLUE benchmarks, with notably high scores on MNLI (87.6) and QNLI (92.8).

Q: What are the recommended use cases?

The model is best suited for tasks that require understanding of complete sentences, including text classification, named entity recognition, and question answering. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.

roberta-base