opt-125m

Maintained By
facebook

OPT-125M

PropertyValue
AuthorMeta AI (Facebook)
LicenseOther (Research Only)
PaperOpen Pre-trained Transformer Language Models
Downloads6.9M+

What is opt-125m?

OPT-125M is the smallest variant in Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 125-million parameter model serves as an open-source alternative to GPT-3-style models, enabling researchers to study and improve upon state-of-the-art language model architectures.

Implementation Details

The model utilizes a decoder-only transformer architecture trained on a diverse corpus of 180B tokens (800GB of data). It implements GPT2's byte-level BPE tokenization with a vocabulary size of 50,272 and processes sequences of up to 2048 tokens.

  • Trained on multiple datasets including BookCorpus, CC-Stories, and filtered components of The Pile
  • Uses causal language modeling (CLM) objective for pre-training
  • Supports both deterministic and top-k sampling for text generation

Core Capabilities

  • Text generation and completion tasks
  • Zero-shot and few-shot learning applications
  • Research experimentation and model behavior analysis
  • Fine-tuning for downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

OPT-125M stands out for its open-source nature and research accessibility, allowing full model access unlike many other large language models that are only available through APIs. It's specifically designed to enable responsible AI research and community-driven improvements in addressing challenges like bias and toxicity.

Q: What are the recommended use cases?

The model is best suited for research purposes, text generation tasks, and as a foundation for fine-tuning on specific downstream applications. It's particularly valuable for studying model behavior, bias, and developing improved training methodologies.

The first platform built for prompt engineering