wav2vec2-xls-r-300m
Property | Value |
---|---|
Author | |
License | Apache-2.0 |
Paper | XLS-R Paper |
Downloads | 3.7M+ |
Languages Supported | 126 |
What is wav2vec2-xls-r-300m?
wav2vec2-xls-r-300m is Facebook's groundbreaking multilingual speech model that represents a significant advancement in cross-lingual speech processing. With 300 million parameters, it's pre-trained on an impressive 436,000 hours of unlabeled speech data across 128 languages, leveraging data from VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107.
Implementation Details
This model is built on the wav2vec 2.0 architecture and requires speech input sampled at 16kHz. It's specifically designed for fine-tuning on downstream tasks like Automatic Speech Recognition (ASR), Translation, or Classification.
- Pre-trained on 436K hours of multilingual speech data
- Utilizes wav2vec 2.0 objective for training
- Supports 126 languages including rare and low-resource languages
- Requires 16kHz audio input sampling rate
Core Capabilities
- Cross-lingual speech representation learning
- Automatic Speech Recognition (ASR)
- Speech Translation
- Language Identification
- Multilingual speech processing
Frequently Asked Questions
Q: What makes this model unique?
The model's extensive language coverage (126 languages) and massive pre-training dataset (436K hours) make it particularly powerful for cross-lingual speech tasks. It significantly improves upon previous state-of-the-art results, reducing error rates by 20-33% on various benchmarks.
Q: What are the recommended use cases?
The model is best suited for fine-tuning on specific downstream tasks such as ASR, speech translation, and language identification. It's particularly valuable for applications requiring multilingual speech processing or working with low-resource languages.