distilroberta-base

Maintained By
distilbert

DistilRoBERTa Base

PropertyValue
Parameter Count82.8M parameters
LicenseApache 2.0
PaperView Paper
Training DataOpenWebTextCorpus
Model TypeTransformer-based language model

What is distilroberta-base?

DistilRoBERTa-base is a compressed version of the RoBERTa-base model, created through knowledge distillation. It features 6 layers, 768 dimension, and 12 attention heads, resulting in 82M parameters - a significant reduction from RoBERTa-base's 125M parameters. This optimization delivers twice the speed while maintaining impressive performance on NLP tasks.

Implementation Details

The model was trained on OpenWebTextCorpus, following the same distillation process as DistilBERT. It's designed to be case-sensitive and operates with F32 tensor type.

  • Architecture: 6-layer transformer with 768 hidden dimensions and 12 attention heads
  • Training Process: Knowledge distillation from RoBERTa-base
  • Performance: Achieves strong results on GLUE benchmarks (e.g., 84.0 on MNLI, 92.5 on SST-2)

Core Capabilities

  • Masked Language Modeling
  • Sequence Classification
  • Token Classification
  • Question Answering
  • Fine-tuning for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

DistilRoBERTa-base stands out for its optimal balance between performance and efficiency, offering twice the speed of RoBERTa-base while retaining 95% of its performance. It's particularly valuable for production environments where computational resources are constrained.

Q: What are the recommended use cases?

The model excels in tasks that require whole-sentence understanding, including sequence classification, token classification, and question answering. It's not recommended for text generation tasks (where models like GPT-2 would be more appropriate) or for creating factual content.

The first platform built for prompt engineering