OPT-1.3B

Property	Value
Developer	Meta AI (Facebook)
License	Custom (Other)
Paper	Open Pre-trained Transformer Language Models
Training Data	180B tokens (800GB)

What is opt-1.3b?

OPT-1.3B is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 1.3 billion parameter model implements a decoder-only architecture similar to GPT-3, trained on a diverse dataset including BookCorpus, CC-Stories, and filtered content from The Pile.

Implementation Details

The model utilizes a GPT2-style byte-level BPE tokenizer with a vocabulary size of 50,272 tokens. It processes sequences of 2048 tokens and was trained using a causal language modeling objective. The training infrastructure leveraged A100 GPUs for efficient processing of the massive dataset.

Decoder-only transformer architecture
Pre-trained on 800GB of filtered text data
Supports both deterministic and top-k sampling generation
Implements efficient training practices for optimal performance

Core Capabilities

Text generation and completion
Zero-shot and few-shot learning
Language understanding and processing
Custom prompt-based tasks

Frequently Asked Questions

Q: What makes this model unique?

OPT-1.3B stands out for its open-source nature and commitment to responsible AI research. It provides comparable capabilities to GPT-3-class models while being fully accessible to researchers studying bias, toxicity, and robustness in language models.

Q: What are the recommended use cases?

The model is best suited for text generation tasks, research purposes, and fine-tuning for specific downstream applications. It can be used directly with the Transformers pipeline for text generation or fine-tuned using the causal language modeling approach.

opt-1.3b