astroBERT
Property | Value |
---|---|
Parameter Count | 110M |
License | MIT |
Paper | arXiv:2112.00590 |
Downloads | 4.9M+ |
What is astroBERT?
astroBERT is a specialized language model developed by NASA/ADS specifically for the astrophysics domain. It's built upon the BERT architecture and has been trained to understand and process astronomical and astrophysical text with high accuracy. The model supports both cased text processing and implements masked language modeling (MLM) and next sentence prediction (NSP) objectives.
Implementation Details
The model features a sophisticated architecture with 110M parameters and supports multiple specialized variants including the base model, NER-DEAL model for named entity recognition, and a SciX Categorizer for scientific text classification. It utilizes PyTorch and Safetensors for efficient processing and supports both I64 and F32 tensor types.
- Base model with masked language modeling and next sentence prediction
- NER-DEAL variant for named entity recognition in astronomical texts
- SciX Categorizer for scientific text classification across 7 categories
- Supports text embedding generation for downstream tasks
Core Capabilities
- Fill-mask prediction for astronomical concepts
- Named Entity Recognition for astronomical terms and concepts
- Scientific text categorization
- Text embedding generation
- Continuous light curve analysis support
Frequently Asked Questions
Q: What makes this model unique?
astroBERT is specifically trained on astrophysics literature, making it highly specialized for astronomical terminology and concepts. It's the first major language model specifically designed for the astronomy community with support for technical astronomical terminology and concepts.
Q: What are the recommended use cases?
The model is ideal for astronomical research text analysis, automated astronomical literature processing, named entity recognition in astronomical texts, and scientific text classification. It's particularly useful for institutions and researchers working with large volumes of astronomical literature and data.