Llama-3-Open-Ko-8B

Maintained By
beomi

Llama-3-Open-Ko-8B

PropertyValue
Parameter Count8.03B
Context Length8k tokens
Training Tokens17.7B+
LicenseLlama3 License
AuthorJunbum Lee (Beomi)

What is Llama-3-Open-Ko-8B?

Llama-3-Open-Ko-8B is a continued pre-trained language model based on Meta's Llama-3 architecture, specifically optimized for Korean language understanding and generation. The model was trained on over 60GB of deduplicated texts using TPUv5e-256 hardware, making it one of the most comprehensive Korean language models available.

Implementation Details

The model leverages the advanced Llama-3 tokenizer and architecture, incorporating Grouped-Query Attention (GQA) for improved efficiency. It was trained on a massive dataset of 17.7B+ tokens, surpassing previous Korean tokenizer implementations.

  • Architecture: Optimized transformer with 8B parameters
  • Training Infrastructure: Google TPUv5e-256
  • Context Window: 8,000 tokens
  • Token Format: BF16

Core Capabilities

  • Advanced Korean language understanding and generation
  • Strong performance on Korean benchmarks (KMMLU, KoBEST)
  • Efficient processing with GQA implementation
  • 8k token context window for handling longer sequences

Frequently Asked Questions

Q: What makes this model unique?

This model represents a significant advancement in Korean language AI, combining Meta's Llama-3 architecture with extensive Korean-specific training. It's trained on publicly available resources, making it more accessible for research and commercial applications.

Q: What are the recommended use cases?

The model is well-suited for commercial and research applications in Korean language processing, including text generation, comprehension, and analysis. It can be further fine-tuned for specific tasks while maintaining compliance with the Llama 3 Community License.

The first platform built for prompt engineering