distilrubert-small-cased-conversational
Property | Value |
---|---|
Parameter Count | 107M parameters |
Model Type | Distilled Transformer |
Architecture | 2-layer, 768-hidden, 12-heads |
Paper | Knowledge Distillation of Russian Language Models |
Author | DeepPavlov |
What is distilrubert-small-cased-conversational?
This is a compressed version of the Conversational RuBERT model, specifically designed for Russian language processing. It was trained on a diverse dataset including OpenSubtitles, social media content from Dirty, Pikabu, and the Taiga corpus. The model achieves significant performance improvements while maintaining quality, with a 4x reduction in CPU latency compared to its teacher model.
Implementation Details
The model employs sophisticated distillation techniques including KL loss, MLM loss, cosine embedding loss, and MSE loss. Training was conducted over 80 hours using 8 NVIDIA Tesla P100-SXM2.0 16Gb GPUs. The model achieves impressive inference speeds, processing 71.35 samples/second on GPU compared to the teacher model's 36.49 samples/second.
- Model Size: 409 MB (significantly smaller than the 679 MB teacher model)
- CPU Latency: 0.1656 seconds
- GPU Latency: 0.015 seconds
- Optimized for conversational tasks
Core Capabilities
- Russian language understanding and generation
- Conversational AI applications
- Classification tasks
- Named Entity Recognition (NER)
- Question Answering
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture that maintains strong performance while significantly reducing computational requirements. It's specifically optimized for Russian language conversational tasks, making it ideal for production deployments where resource efficiency is crucial.
Q: What are the recommended use cases?
The model is particularly well-suited for conversational AI applications, chatbots, and social media analysis in Russian language contexts. It's optimized for scenarios where quick inference times are required while maintaining good performance on tasks like classification, NER, and question answering.