GPT BigCode SantaCoder
Property | Value |
---|---|
Parameter Count | 1.12B |
Model Type | Text Generation (Code) |
Architecture | GPT-2 with multi-query attention |
License | CodeML Open RAIL-M v0.1 |
Training Data | 236 billion tokens from GitHub |
What is gpt_bigcode-santacoder?
SantaCoder is a specialized code generation model trained on permissively-licensed GitHub code. Built using the GPT-2 architecture with multi-query attention, it excels at generating and completing code in Python, Java, and JavaScript. The model was trained for 600K steps using 96 Tesla V100 GPUs over 6.2 days.
Implementation Details
The model implements a Fill-in-the-Middle objective and uses float16 precision for efficient computation. It's designed to work with transformers >=4.28.1 and utilizes the GPTBigCode architecture for enhanced performance.
- Trained on 236 billion tokens of source code
- Implements multi-query attention mechanism
- Uses PyTorch and Megatron-LM for training
- Supports both completion and infilling tasks
Core Capabilities
- Code generation in Python (pass@100: 0.49), JavaScript (pass@100: 0.47), and Java (pass@100: 0.41)
- Strong performance on code-to-text tasks (BLEU: 18.13)
- Single-line exact match rates: Python (0.44), Java (0.62), JavaScript (0.60)
- Context-aware code completion and generation
Frequently Asked Questions
Q: What makes this model unique?
SantaCoder stands out for its specialized training on permissively-licensed code and its ability to handle multiple programming languages effectively. The Fill-in-the-Middle objective allows it to both complete and infill code snippets.
Q: What are the recommended use cases?
The model works best with source code-style prompts rather than natural language instructions. It's ideal for code completion, documentation generation, and code infilling tasks. Users should phrase requests as code comments or provide function signatures for optimal results.