wav2vec2-large-xlsr-53-russian

Maintained By
jonatasgrosman

wav2vec2-large-xlsr-53-russian

PropertyValue
LicenseApache 2.0
Downloads3.6M+
Test WER13.3% (9.57% with LM)
Test CER2.88% (2.24% with LM)

What is wav2vec2-large-xlsr-53-russian?

This is a specialized speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 specifically for the Russian language. It's trained on Common Voice 6.1 and CSS10 datasets, optimized for 16kHz audio input, and demonstrates strong performance in Russian speech-to-text tasks.

Implementation Details

The model leverages the wav2vec2 architecture and has been fine-tuned using OVHcloud's GPU resources. It processes audio at 16kHz sampling rate and can be used with or without a language model, with the latter providing improved accuracy.

  • Base Architecture: wav2vec2-large-xlsr-53
  • Training Data: Common Voice 6.1 and CSS10
  • Sampling Rate: 16kHz
  • Metrics: WER: 13.3% (9.57% with LM), CER: 2.88% (2.24% with LM)

Core Capabilities

  • Direct audio transcription without language model
  • Enhanced accuracy with language model integration
  • Batch processing of audio files
  • Support for both .mp3 and .wav formats
  • Integration with HuggingSound library for easy implementation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Russian language speech recognition, achieving impressive WER scores and offering both standard and language model-enhanced transcription options.

Q: What are the recommended use cases?

The model is ideal for Russian speech transcription tasks, including audio content analysis, subtitle generation, and voice command systems. It's particularly effective when integrated with the language model for higher accuracy requirements.

The first platform built for prompt engineering