EXAONE-Deep-32B-AWQ
Property | Value |
---|---|
Parameters | 30.95B |
Context Length | 32,768 tokens |
License | EXAONE AI Model License Agreement 1.1 - NC |
Quantization | AWQ 4-bit group-wise weight-only (W4A16g128) |
Architecture | 64 layers, 40 Q-heads, 8 KV-heads (GQA) |
What is EXAONE-Deep-32B-AWQ?
EXAONE-Deep-32B-AWQ is a state-of-the-art language model developed by LG AI Research, specifically optimized for reasoning tasks including mathematics and coding. This quantized version maintains the powerful capabilities of the original model while reducing its computational requirements through 4-bit quantization.
Implementation Details
The model features a sophisticated architecture with 30.95B parameters, implemented across 64 layers using Grouped-Query Attention (GQA) with 40 query heads and 8 key-value heads. It supports an impressive context length of 32,768 tokens and uses a vocabulary size of 102,400.
- AWQ quantization with 4-bit precision
- Group-wise weight-only quantization (W4A16g128)
- Optimized for bfloat16 inference
- Requires transformers>=4.43.1 and autoawq>=0.2.8
Core Capabilities
- Advanced reasoning in mathematics and coding tasks
- Long-context processing up to 32K tokens
- Competitive performance against leading open-weight models
- Specialized in step-by-step problem solving
- Support for multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang
Frequently Asked Questions
Q: What makes this model unique?
EXAONE-Deep-32B-AWQ stands out for its exceptional reasoning capabilities and efficient quantization, making it particularly effective for mathematical and coding tasks while maintaining high performance with reduced computational requirements.
Q: What are the recommended use cases?
The model excels in scenarios requiring detailed reasoning, such as solving complex mathematical problems, coding challenges, and tasks that benefit from step-by-step analysis. It's particularly well-suited for applications needing both high accuracy and efficient resource utilization.