BERT Base Uncased
Property | Value |
---|---|
Parameter Count | 110M |
License | Apache 2.0 |
Paper | View Paper |
Training Data | BookCorpus + Wikipedia |
Architecture | Transformer-based |
What is bert-base-uncased?
BERT base uncased is a foundational transformer model that revolutionized natural language processing. Developed by Google, this 110M parameter model is trained on a massive corpus of lowercase English text, treating "english" and "English" identically. It uses innovative bidirectional training and implements two key pre-training objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).
Implementation Details
The model employs a sophisticated pre-training approach where 15% of input tokens are masked, with 80% replaced by [MASK] tokens, 10% by random tokens, and 10% left unchanged. It processes sequences up to 512 tokens long and uses WordPiece tokenization with a 30,000 token vocabulary.
- Training utilized 4 cloud TPUs in Pod configuration
- Trained for 1 million steps with 256 batch size
- Uses Adam optimizer with 1e-4 learning rate
- Implements learning rate warmup and linear decay
Core Capabilities
- Masked language modeling for bidirectional context understanding
- Next sentence prediction for document-level comprehension
- Feature extraction for downstream tasks
- High performance on GLUE benchmark tasks
- Efficient fine-tuning capabilities
Frequently Asked Questions
Q: What makes this model unique?
This model's bidirectional training architecture sets it apart, allowing it to understand context from both directions simultaneously, unlike traditional left-to-right language models. Its masked language modeling approach enables deep bidirectional representations.
Q: What are the recommended use cases?
The model excels in tasks that require whole-sentence understanding, including sequence classification, token classification, and question answering. It's particularly effective when fine-tuned for specific downstream tasks but isn't recommended for text generation tasks.