deberta-v3-large

Maintained By
microsoft

DeBERTa V3 Large

PropertyValue
Parameters304M (backbone) + 131M (embedding)
LicenseMIT
AuthorMicrosoft
PaperDeBERTaV3 Paper

What is deberta-v3-large?

DeBERTa-v3-large is Microsoft's advanced language model that builds upon the success of DeBERTa architecture with ELECTRA-style pre-training and gradient-disentangled embedding sharing. With 24 layers and a hidden size of 1024, it demonstrates significant improvements over its predecessors in various NLU tasks.

Implementation Details

The model incorporates a sophisticated architecture with 304M backbone parameters and a 128K token vocabulary that adds 131M parameters in the Embedding layer. It was trained on 160GB of data, similar to DeBERTa V2.

  • Enhanced mask decoder implementation
  • Disentangled attention mechanism
  • ELECTRA-style pre-training approach
  • Gradient-disentangled embedding sharing

Core Capabilities

  • Achieved 91.5/89.0 F1/EM scores on SQuAD 2.0
  • Superior performance on MNLI with 91.8/91.9 accuracy
  • Efficient fine-tuning capabilities for downstream tasks
  • Advanced masked language modeling

Frequently Asked Questions

Q: What makes this model unique?

DeBERTa-v3-large combines disentangled attention with ELECTRA-style pre-training, significantly outperforming previous models like RoBERTa and XLNet on key NLU benchmarks.

Q: What are the recommended use cases?

The model excels in natural language understanding tasks, particularly in question answering (SQuAD) and natural language inference (MNLI). It's well-suited for complex NLP tasks requiring deep semantic understanding.

The first platform built for prompt engineering