asr-wav2vec2-dvoice-darija
Property | Value |
---|---|
Model Type | Speech Recognition (ASR) |
Architecture | wav2vec 2.0 + CTC/Attention |
Performance | 18.28% WER (Test), 5.85% CER (Test) |
Source | HuggingFace |
What is asr-wav2vec2-dvoice-darija?
This is a specialized automatic speech recognition model designed specifically for Darija (Moroccan Arabic dialect), developed as part of the DVoice initiative. It combines Facebook's wav2vec 2.0 architecture with CTC/Attention mechanisms, trained on the DVoice Darija dataset. The model represents a significant advancement in ASR technology for low-resource African languages.
Implementation Details
The model architecture consists of two main components: a unigram tokenizer for subword unit transformation and an acoustic model based on wav2vec 2.0. It utilizes the facebook/wav2vec2-large-xlsr-53 pretrained model as its foundation, enhanced with two additional DNN layers fine-tuned on Darija speech data.
- Supports 16kHz audio input (single channel)
- Automatic audio normalization capabilities
- Implements CTC greedy decoder for inference
- Built using the SpeechBrain framework
Core Capabilities
- Direct transcription of Darija speech to text
- Achieves 18.28% Word Error Rate on test data
- Supports GPU inference for faster processing
- Handles automatic audio preprocessing
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Darija, a traditionally under-resourced language. It's part of the DVoice initiative, which aims to improve voice technology access for African languages. The combination of wav2vec 2.0 with CTC/Attention mechanisms makes it particularly effective for Darija speech recognition.
Q: What are the recommended use cases?
The model is ideal for transcribing Darija speech in various applications, including voice assistants, transcription services, and speech-to-text applications. It's particularly suitable for applications requiring Moroccan Arabic dialect understanding.