RoBERTa-Large

Property	Value
Parameter Count	355M
License	MIT
Paper	Research Paper
Training Data	BookCorpus, Wikipedia, CC-News, OpenWebText, Stories
Developer	Facebook AI

What is roberta-large?

RoBERTa-large is a robust transformer-based language model developed by Facebook AI, representing an optimized version of BERT. With 355M parameters, it was trained on a massive 160GB text corpus using masked language modeling (MLM) techniques. The model demonstrates exceptional performance across various natural language processing tasks, particularly excelling in the GLUE benchmark.

Implementation Details

The model employs a sophisticated byte-level BPE tokenization with a 50,000-token vocabulary. Training was conducted on 1024 V100 GPUs for 500K steps, using a batch size of 8K and sequence length of 512. The optimization process utilized Adam with carefully tuned hyperparameters and learning rate scheduling.

Dynamic masking strategy with 15% token masking
Trained on multiple large-scale datasets including BookCorpus and Wikipedia
Achieves state-of-the-art results on GLUE tasks (90.2 MNLI, 94.7 QNLI, 96.4 SST-2)

Core Capabilities

Masked language modeling with bidirectional context understanding
Feature extraction for downstream tasks
Sequence classification and token classification
Question answering tasks

Frequently Asked Questions

Q: What makes this model unique?

RoBERTa's uniqueness lies in its robust optimization approach, dynamic masking strategy, and training on a significantly larger dataset compared to BERT. It achieves superior performance through careful hyperparameter tuning and extended training.

Q: What are the recommended use cases?

The model excels in tasks requiring whole-sentence understanding, including sequence classification, token classification, and question answering. However, it's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.

roberta-large