bert-base-cased

Maintained By
google-bert

BERT Base Cased

PropertyValue
Parameter Count109M parameters
LicenseApache 2.0
PaperOriginal Paper
Training DataBookCorpus + Wikipedia
Framework SupportPyTorch, TensorFlow, JAX

What is bert-base-cased?

BERT-base-cased is a case-sensitive transformer model pretrained on English text using masked language modeling and next sentence prediction objectives. Developed by Google, it represents a foundational advancement in natural language processing, capable of understanding context in both directions within text sequences.

Implementation Details

The model was trained on 4 cloud TPUs for one million steps with a 256 batch size. It uses WordPiece tokenization with a 30,000 token vocabulary and implements special tokens [CLS] and [SEP] for classification and sentence separation. During training, 15% of tokens are masked with specific replacement strategies to enable bidirectional learning.

  • Masked Language Modeling (MLM) with 15% token masking
  • Next Sentence Prediction (NSP) for understanding sentence relationships
  • 512 token maximum sequence length
  • Adam optimizer with learning rate warmup

Core Capabilities

  • Bidirectional context understanding
  • Case-sensitive text processing
  • Token classification and sequence classification
  • Question answering tasks
  • Feature extraction for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

This model's case-sensitivity and bidirectional architecture make it particularly useful for tasks where capitalization matters. It can capture nuanced contextual information from both directions in text, unlike traditional left-to-right language models.

Q: What are the recommended use cases?

The model excels at tasks requiring whole-sentence understanding, such as sequence classification, token classification, and question answering. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.

The first platform built for prompt engineering