DeepSeek-V2-Lite-Chat
Property | Value |
---|---|
Parameter Count | 15.7B (2.4B active) |
Model Type | Mixture-of-Experts (MoE) |
Context Length | 32k tokens |
License | DeepSeek License |
Paper | arXiv:2405.04434 |
What is DeepSeek-V2-Lite-Chat?
DeepSeek-V2-Lite-Chat is an innovative language model that combines efficiency with powerful capabilities through its Mixture-of-Experts architecture. Built on advanced technologies including Multi-head Latent Attention (MLA) and DeepSeekMoE, it achieves impressive performance while maintaining deployability on a single 40GB GPU.
Implementation Details
The model features 27 layers with a hidden dimension of 2048 and employs 16 attention heads. Its architecture incorporates MLA with a KV compression dimension of 512, and utilizes a sophisticated MoE system with 2 shared experts and 64 routed experts, where 6 experts are activated per token. The model has been trained on 5.7T tokens and can process sequences up to 32k tokens in length.
- Innovative MLA architecture for efficient key-value cache compression
- DeepSeekMoE implementation for economical training and inference
- BF16 tensor format for optimal performance
- Sophisticated expert routing system with 6 active experts per token
Core Capabilities
- Strong performance on MMLU (55.7%), BBH (48.1%), and GSM8K (72.0%)
- Excellent multilingual capabilities with high scores on C-Eval (60.1%) and CMMLU (62.5%)
- Advanced code generation abilities with 57.3% on HumanEval
- Mathematical reasoning capabilities demonstrated through Math benchmark (27.9%)
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its efficient architecture that achieves high performance with only 2.4B active parameters out of 15.7B total parameters, making it deployable on a single GPU while maintaining competitive performance with larger models.
Q: What are the recommended use cases?
DeepSeek-V2-Lite-Chat excels in various applications including multilingual tasks, code generation, mathematical reasoning, and general language understanding. It's particularly suitable for scenarios requiring efficient deployment while maintaining high-quality outputs.