GPT BigCode SantaCoder

Property	Value
Parameter Count	1.12B
Model Type	Text Generation (Code)
Architecture	GPT-2 with multi-query attention
License	CodeML Open RAIL-M v0.1
Training Data	236 billion tokens from GitHub

What is gpt_bigcode-santacoder?

SantaCoder is a specialized code generation model trained on permissively-licensed GitHub code. Built using the GPT-2 architecture with multi-query attention, it excels at generating and completing code in Python, Java, and JavaScript. The model was trained for 600K steps using 96 Tesla V100 GPUs over 6.2 days.

Implementation Details

The model implements a Fill-in-the-Middle objective and uses float16 precision for efficient computation. It's designed to work with transformers >=4.28.1 and utilizes the GPTBigCode architecture for enhanced performance.

Trained on 236 billion tokens of source code
Implements multi-query attention mechanism
Uses PyTorch and Megatron-LM for training
Supports both completion and infilling tasks

Core Capabilities

Code generation in Python (pass@100: 0.49), JavaScript (pass@100: 0.47), and Java (pass@100: 0.41)
Strong performance on code-to-text tasks (BLEU: 18.13)
Single-line exact match rates: Python (0.44), Java (0.62), JavaScript (0.60)
Context-aware code completion and generation

Frequently Asked Questions

Q: What makes this model unique?

SantaCoder stands out for its specialized training on permissively-licensed code and its ability to handle multiple programming languages effectively. The Fill-in-the-Middle objective allows it to both complete and infill code snippets.

Q: What are the recommended use cases?

The model works best with source code-style prompts rather than natural language instructions. It's ideal for code completion, documentation generation, and code infilling tasks. Users should phrase requests as code comments or provide function signatures for optimal results.