gpt_bigcode-santacoder

Maintained By
bigcode

GPT BigCode SantaCoder

PropertyValue
Parameter Count1.12B
Model TypeText Generation (Code)
ArchitectureGPT-2 with multi-query attention
LicenseCodeML Open RAIL-M v0.1
Training Data236 billion tokens from GitHub

What is gpt_bigcode-santacoder?

SantaCoder is a specialized code generation model trained on permissively-licensed GitHub code. Built using the GPT-2 architecture with multi-query attention, it excels at generating and completing code in Python, Java, and JavaScript. The model was trained for 600K steps using 96 Tesla V100 GPUs over 6.2 days.

Implementation Details

The model implements a Fill-in-the-Middle objective and uses float16 precision for efficient computation. It's designed to work with transformers >=4.28.1 and utilizes the GPTBigCode architecture for enhanced performance.

  • Trained on 236 billion tokens of source code
  • Implements multi-query attention mechanism
  • Uses PyTorch and Megatron-LM for training
  • Supports both completion and infilling tasks

Core Capabilities

  • Code generation in Python (pass@100: 0.49), JavaScript (pass@100: 0.47), and Java (pass@100: 0.41)
  • Strong performance on code-to-text tasks (BLEU: 18.13)
  • Single-line exact match rates: Python (0.44), Java (0.62), JavaScript (0.60)
  • Context-aware code completion and generation

Frequently Asked Questions

Q: What makes this model unique?

SantaCoder stands out for its specialized training on permissively-licensed code and its ability to handle multiple programming languages effectively. The Fill-in-the-Middle objective allows it to both complete and infill code snippets.

Q: What are the recommended use cases?

The model works best with source code-style prompts rather than natural language instructions. It's ideal for code completion, documentation generation, and code infilling tasks. Users should phrase requests as code comments or provide function signatures for optimal results.

The first platform built for prompt engineering