EXAONE-Deep-32B-AWQ

Property	Value
Parameters	30.95B
Context Length	32,768 tokens
License	EXAONE AI Model License Agreement 1.1 - NC
Quantization	AWQ 4-bit group-wise weight-only (W4A16g128)
Architecture	64 layers, 40 Q-heads, 8 KV-heads (GQA)

What is EXAONE-Deep-32B-AWQ?

EXAONE-Deep-32B-AWQ is a state-of-the-art language model developed by LG AI Research, specifically optimized for reasoning tasks including mathematics and coding. This quantized version maintains the powerful capabilities of the original model while reducing its computational requirements through 4-bit quantization.

Implementation Details

The model features a sophisticated architecture with 30.95B parameters, implemented across 64 layers using Grouped-Query Attention (GQA) with 40 query heads and 8 key-value heads. It supports an impressive context length of 32,768 tokens and uses a vocabulary size of 102,400.

AWQ quantization with 4-bit precision
Group-wise weight-only quantization (W4A16g128)
Optimized for bfloat16 inference
Requires transformers>=4.43.1 and autoawq>=0.2.8

Core Capabilities

Advanced reasoning in mathematics and coding tasks
Long-context processing up to 32K tokens
Competitive performance against leading open-weight models
Specialized in step-by-step problem solving
Support for multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang

Frequently Asked Questions

Q: What makes this model unique?

EXAONE-Deep-32B-AWQ stands out for its exceptional reasoning capabilities and efficient quantization, making it particularly effective for mathematical and coding tasks while maintaining high performance with reduced computational requirements.

Q: What are the recommended use cases?

The model excels in scenarios requiring detailed reasoning, such as solving complex mathematical problems, coding challenges, and tasks that benefit from step-by-step analysis. It's particularly well-suited for applications needing both high accuracy and efficient resource utilization.