RoBERTa Base Model
Property | Value |
---|---|
Parameter Count | 125M |
License | MIT |
Author | FacebookAI |
Paper | Link to paper |
Training Data | BookCorpus, Wikipedia, CC-News, OpenWebText, Stories |
What is roberta-base?
RoBERTa-base is a transformer-based language model developed by FacebookAI as an optimized version of BERT. It was trained on a massive 160GB text corpus using masked language modeling, making it particularly effective for various NLP tasks. The model employs a byte-level BPE tokenizer with a 50,000 token vocabulary and can process sequences up to 512 tokens long.
Implementation Details
The model was trained using 1024 V100 GPUs for 500K steps with a batch size of 8K. It implements dynamic masking during pretraining, where 15% of tokens are masked in varying ways: 80% replaced with
- Architecture: Bidirectional transformer encoder
- Training Optimization: Adam optimizer with learning rate 6e-4
- Preprocessing: Byte-Pair Encoding with 50K vocabulary
- Case-sensitive: Distinguishes between different cases
Core Capabilities
- Masked Language Modeling
- Feature Extraction for Downstream Tasks
- Sequence Classification
- Token Classification
- Question Answering
Frequently Asked Questions
Q: What makes this model unique?
RoBERTa's key distinction lies in its robust optimization of the BERT architecture, utilizing dynamic masking and larger training data. It achieves state-of-the-art performance on GLUE benchmarks, with notably high scores on MNLI (87.6) and QNLI (92.8).
Q: What are the recommended use cases?
The model is best suited for tasks that require understanding of complete sentences, including text classification, named entity recognition, and question answering. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.