indonesian-roberta-base-posp-tagger
Property | Value |
---|---|
Parameter Count | 124M |
License | MIT |
Framework | PyTorch, Transformers |
Base Model | flax-community/indonesian-roberta-base |
What is indonesian-roberta-base-posp-tagger?
This is a specialized Part-of-Speech (POS) tagger built on RoBERTa architecture, specifically fine-tuned for Indonesian language processing. The model demonstrates exceptional performance with 96.25% accuracy across precision, recall, and F1 metrics on the IndoNLU dataset.
Implementation Details
The model is implemented using the Transformers library and PyTorch framework, fine-tuned from the indonesian-roberta-base model. Training was conducted over 10 epochs using the Adam optimizer with a learning rate of 2e-05 and linear scheduler.
- Batch size: 16 for both training and evaluation
- Training optimization: Adam (β1=0.9, β2=0.999, ε=1e-08)
- Final validation loss: 0.1668
- Best performance achieved at epoch 10
Core Capabilities
- High-accuracy POS tagging for Indonesian text
- Token classification with 96.25% precision and recall
- Optimized for Indonesian language understanding
- Suitable for integration into larger NLP pipelines
Frequently Asked Questions
Q: What makes this model unique?
This model combines the power of RoBERTa architecture with specific optimizations for Indonesian language, achieving state-of-the-art performance in POS tagging tasks with consistent 96.25% accuracy across all metrics.
Q: What are the recommended use cases?
The model is ideal for Indonesian text analysis tasks requiring part-of-speech tagging, including syntactic parsing, grammatical analysis, and text preprocessing for downstream NLP tasks.