bert-base-uncased

Maintained By
google-bert

BERT Base Uncased

PropertyValue
Parameter Count110M
LicenseApache 2.0
PaperView Paper
Training DataBookCorpus + Wikipedia
ArchitectureTransformer-based

What is bert-base-uncased?

BERT base uncased is a foundational transformer model that revolutionized natural language processing. Developed by Google, this 110M parameter model is trained on a massive corpus of lowercase English text, treating "english" and "English" identically. It uses innovative bidirectional training and implements two key pre-training objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).

Implementation Details

The model employs a sophisticated pre-training approach where 15% of input tokens are masked, with 80% replaced by [MASK] tokens, 10% by random tokens, and 10% left unchanged. It processes sequences up to 512 tokens long and uses WordPiece tokenization with a 30,000 token vocabulary.

  • Training utilized 4 cloud TPUs in Pod configuration
  • Trained for 1 million steps with 256 batch size
  • Uses Adam optimizer with 1e-4 learning rate
  • Implements learning rate warmup and linear decay

Core Capabilities

  • Masked language modeling for bidirectional context understanding
  • Next sentence prediction for document-level comprehension
  • Feature extraction for downstream tasks
  • High performance on GLUE benchmark tasks
  • Efficient fine-tuning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model's bidirectional training architecture sets it apart, allowing it to understand context from both directions simultaneously, unlike traditional left-to-right language models. Its masked language modeling approach enables deep bidirectional representations.

Q: What are the recommended use cases?

The model excels in tasks that require whole-sentence understanding, including sequence classification, token classification, and question answering. It's particularly effective when fine-tuned for specific downstream tasks but isn't recommended for text generation tasks.

The first platform built for prompt engineering