distilrubert-small-cased-conversational

Maintained By
DeepPavlov

distilrubert-small-cased-conversational

PropertyValue
Parameter Count107M parameters
Model TypeDistilled Transformer
Architecture2-layer, 768-hidden, 12-heads
PaperKnowledge Distillation of Russian Language Models
AuthorDeepPavlov

What is distilrubert-small-cased-conversational?

This is a compressed version of the Conversational RuBERT model, specifically designed for Russian language processing. It was trained on a diverse dataset including OpenSubtitles, social media content from Dirty, Pikabu, and the Taiga corpus. The model achieves significant performance improvements while maintaining quality, with a 4x reduction in CPU latency compared to its teacher model.

Implementation Details

The model employs sophisticated distillation techniques including KL loss, MLM loss, cosine embedding loss, and MSE loss. Training was conducted over 80 hours using 8 NVIDIA Tesla P100-SXM2.0 16Gb GPUs. The model achieves impressive inference speeds, processing 71.35 samples/second on GPU compared to the teacher model's 36.49 samples/second.

  • Model Size: 409 MB (significantly smaller than the 679 MB teacher model)
  • CPU Latency: 0.1656 seconds
  • GPU Latency: 0.015 seconds
  • Optimized for conversational tasks

Core Capabilities

  • Russian language understanding and generation
  • Conversational AI applications
  • Classification tasks
  • Named Entity Recognition (NER)
  • Question Answering

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture that maintains strong performance while significantly reducing computational requirements. It's specifically optimized for Russian language conversational tasks, making it ideal for production deployments where resource efficiency is crucial.

Q: What are the recommended use cases?

The model is particularly well-suited for conversational AI applications, chatbots, and social media analysis in Russian language contexts. It's optimized for scenarios where quick inference times are required while maintaining good performance on tasks like classification, NER, and question answering.

The first platform built for prompt engineering