RigoBERTa-Clinical

Maintained By
IIC

RigoBERTa-Clinical

PropertyValue
DeveloperIIC
Model TypeClinical Encoder Language Model
LanguageSpanish
Licenserigoclinical-nc (Non Commercial)
Base ModelRigoBERTa 2
Research PaperClinText-SP and RigoBERTa Clinical

What is RigoBERTa-Clinical?

RigoBERTa-Clinical is a specialized Spanish clinical language model developed through domain-adaptive pretraining on ClinText-SP, the largest publicly available Spanish clinical corpus. The model builds upon the general-purpose RigoBERTa 2 architecture and has been fine-tuned specifically for understanding medical terminology and clinical narratives in Spanish.

Implementation Details

The model was trained using Masked Language Modeling (MLM) on a carefully curated dataset of 26 million tokens across 35,996 clinical samples. The training process utilized an NVIDIA A100 GPU with optimal hyperparameters (batch size=32, learning rate=2e-5) over 2,800 training steps.

  • Leverages the RigoBERTa 2 tokenizer for consistent text processing
  • Handles sequences up to 512 tokens with 128-token stride for longer texts
  • Implements subword tokenization for managing medical terminology
  • Trained on diverse clinical sources including medical journals and radiological reports

Core Capabilities

  • Clinical text understanding in Spanish
  • Named Entity Recognition (NER) in medical contexts
  • Clinical note classification
  • State-of-the-art performance on multiple clinical NLP benchmarks

Frequently Asked Questions

Q: What makes this model unique?

RigoBERTa-Clinical combines general language understanding from RigoBERTa 2 with specialized clinical domain knowledge, achieving superior performance on Spanish medical NLP tasks compared to both general-purpose and clinical-only models.

Q: What are the recommended use cases?

The model is ideal for healthcare NLP applications in Spanish, including clinical document classification, medical entity recognition, and research purposes. However, it's important to note that it's licensed for non-commercial use only.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.