wav2vec2-large-xlsr-53-russian
Property | Value |
---|---|
License | Apache 2.0 |
Downloads | 3.6M+ |
Test WER | 13.3% (9.57% with LM) |
Test CER | 2.88% (2.24% with LM) |
What is wav2vec2-large-xlsr-53-russian?
This is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 specifically for the Russian language. It's trained on Common Voice 6.1 and CSS10 datasets, optimized for 16kHz audio input, and demonstrates strong performance in Russian speech-to-text tasks.
Implementation Details
The model leverages the wav2vec2 architecture and has been fine-tuned using OVHcloud's GPU resources. It processes audio at 16kHz sampling rate and can be used with or without a language model, with the latter providing improved accuracy.
- Base Architecture: wav2vec2-large-xlsr-53
- Training Data: Common Voice 6.1 and CSS10
- Sampling Rate: 16kHz
- Metrics: WER: 13.3% (9.57% with LM), CER: 2.88% (2.24% with LM)
Core Capabilities
- Direct audio transcription without language model
- Enhanced accuracy with language model integration
- Batch processing of audio files
- Support for both .mp3 and .wav formats
- Integration with HuggingSound library for easy implementation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Russian language speech recognition, achieving impressive WER scores and offering both standard and language model-enhanced transcription options.
Q: What are the recommended use cases?
The model is ideal for Russian speech transcription tasks, including audio content analysis, subtitle generation, and voice command systems. It's particularly effective when integrated with the language model for higher accuracy requirements.