Laser-Dolphin-Mixtral-2x7b-dpo
Property | Value |
---|---|
Parameter Count | 12.9B |
Model Type | Mixture of Experts (MoE) |
License | Apache 2.0 |
Paper | Layer-Selective Rank Reduction Paper |
What is laser-dolphin-mixtral-2x7b-dpo?
Laser-Dolphin-Mixtral is an advanced medium-sized Mixture of Experts (MoE) language model that combines the power of Mixtral architecture with innovative optimization techniques. Built upon the foundation of dolphin-2.6-mistral-7b-dpo-laser, this model implements layer-selective rank reduction and random matrix theory to enhance its performance across various tasks.
Implementation Details
The model employs a sophisticated architecture utilizing MoE technology, featuring BF16 tensor types and extensive optimizations through DPO (Direct Preference Optimization) and laser techniques. It achieves impressive benchmark scores, including 67.16% average on the Open LLM Leaderboard.
- Achieves 85.80% on HellaSwag (10-Shot)
- 63.17% accuracy on MMLU (5-Shot)
- 48.29% accuracy on GSM8k (5-shot)
Core Capabilities
- Strong reasoning and comprehension abilities demonstrated through high performance on AI2 Reasoning Challenge (65.96%)
- Excellent common sense understanding shown by HellaSwag results
- Multiple quantization options available for different hardware configurations
- Supports various deployment options including GGUF, AWQ, and ExLlamav2 quantizations
Frequently Asked Questions
Q: What makes this model unique?
This model stands out through its implementation of layer-selective rank reduction techniques and its optimization using both DPO and laser methods, resulting in superior performance compared to single 7B models by 5-6 points on average.
Q: What are the recommended use cases?
The model is well-suited for a wide range of tasks including reasoning, common sense understanding, and mathematical problem-solving. It can be deployed in various configurations, from full precision to highly optimized quantized versions for resource-constrained environments.