RoBERTa-Large
Property | Value |
---|---|
Parameter Count | 355M |
License | MIT |
Paper | Research Paper |
Training Data | BookCorpus, Wikipedia, CC-News, OpenWebText, Stories |
Developer | Facebook AI |
What is roberta-large?
RoBERTa-large is a robust transformer-based language model developed by Facebook AI, representing an optimized version of BERT. With 355M parameters, it was trained on a massive 160GB text corpus using masked language modeling (MLM) techniques. The model demonstrates exceptional performance across various natural language processing tasks, particularly excelling in the GLUE benchmark.
Implementation Details
The model employs a sophisticated byte-level BPE tokenization with a 50,000-token vocabulary. Training was conducted on 1024 V100 GPUs for 500K steps, using a batch size of 8K and sequence length of 512. The optimization process utilized Adam with carefully tuned hyperparameters and learning rate scheduling.
- Dynamic masking strategy with 15% token masking
- Trained on multiple large-scale datasets including BookCorpus and Wikipedia
- Achieves state-of-the-art results on GLUE tasks (90.2 MNLI, 94.7 QNLI, 96.4 SST-2)
Core Capabilities
- Masked language modeling with bidirectional context understanding
- Feature extraction for downstream tasks
- Sequence classification and token classification
- Question answering tasks
Frequently Asked Questions
Q: What makes this model unique?
RoBERTa's uniqueness lies in its robust optimization approach, dynamic masking strategy, and training on a significantly larger dataset compared to BERT. It achieves superior performance through careful hyperparameter tuning and extended training.
Q: What are the recommended use cases?
The model excels in tasks requiring whole-sentence understanding, including sequence classification, token classification, and question answering. However, it's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.