mamba-7b-rw

Maintained By
TRI-ML

Mamba-7B

PropertyValue
Parameter Count7.15B
LicenseApache 2.0
Training DataRefinedWeb (1.2T tokens)
ArchitectureMamba SSM
PaperLinearizing Large Language Models

What is mamba-7b-rw?

Mamba-7B is a state-of-the-art language model developed by Toyota Research Institute that implements the innovative Mamba architecture, which replaces traditional transformer self-attention with state-space models for more efficient sequence processing. Trained on 1.2T tokens of RefinedWeb data, it represents a significant advancement in linear-time sequence modeling.

Implementation Details

The model features a 4096 hidden size across 64 layers, with a vocabulary size of 50432 and maximum sequence length of 2048 tokens. It was trained using bfloat16 precision on 128 H100 GPUs, implementing the AdamW optimizer with a carefully tuned learning rate schedule.

  • Training utilized AWS SageMaker infrastructure
  • Implements the EleutherAI/gpt-neox-20b tokenizer
  • Uses OpenLM library for efficient training and inference

Core Capabilities

  • Achieves 77.9% accuracy on HellaSwag benchmark
  • Strong performance on PIQA (81.0%) and Winogrande (71.8%)
  • Competitive results on ARC-Easy (77.5%) and ARC-Challenge (46.7%)
  • Efficient text generation with linear-time complexity

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in being one of the largest publicly available Mamba architecture implementations, offering linear-time sequence processing without traditional attention mechanisms while maintaining competitive performance.

Q: What are the recommended use cases?

The model is well-suited for general text generation tasks, particularly those requiring efficient processing of long sequences. It performs especially well on common sense reasoning and natural language understanding tasks.

The first platform built for prompt engineering