Llama-3-Open-Ko-8B
Property | Value |
---|---|
Parameter Count | 8.03B |
Context Length | 8k tokens |
Training Tokens | 17.7B+ |
License | Llama3 License |
Author | Junbum Lee (Beomi) |
What is Llama-3-Open-Ko-8B?
Llama-3-Open-Ko-8B is a continued pre-trained language model based on Meta's Llama-3 architecture, specifically optimized for Korean language understanding and generation. The model was trained on over 60GB of deduplicated texts using TPUv5e-256 hardware, making it one of the most comprehensive Korean language models available.
Implementation Details
The model leverages the advanced Llama-3 tokenizer and architecture, incorporating Grouped-Query Attention (GQA) for improved efficiency. It was trained on a massive dataset of 17.7B+ tokens, surpassing previous Korean tokenizer implementations.
- Architecture: Optimized transformer with 8B parameters
- Training Infrastructure: Google TPUv5e-256
- Context Window: 8,000 tokens
- Token Format: BF16
Core Capabilities
- Advanced Korean language understanding and generation
- Strong performance on Korean benchmarks (KMMLU, KoBEST)
- Efficient processing with GQA implementation
- 8k token context window for handling longer sequences
Frequently Asked Questions
Q: What makes this model unique?
This model represents a significant advancement in Korean language AI, combining Meta's Llama-3 architecture with extensive Korean-specific training. It's trained on publicly available resources, making it more accessible for research and commercial applications.
Q: What are the recommended use cases?
The model is well-suited for commercial and research applications in Korean language processing, including text generation, comprehension, and analysis. It can be further fine-tuned for specific tasks while maintaining compliance with the Llama 3 Community License.