Skip to content

maziyarpanahi/openmed

Repository files navigation

OpenMed

Production-ready medical NLP toolkit powered by state-of-the-art transformers

Transform clinical text into structured insights with a single line of code. OpenMed delivers enterprise-grade entity extraction, assertion detection, and medical reasoning—no vendor lock-in, no compromise on accuracy.

License Python 3.10+ arXiv Open In Colab

from openmed import analyze_text

result = analyze_text(
    "Patient started on imatinib for chronic myeloid leukemia.",
    model_name="disease_detection_superclinical"
)

for entity in result.entities:
    print(f"{entity.label:<12} {entity.text:<35} {entity.confidence:.2f}")
# DISEASE      chronic myeloid leukemia            0.98
# DRUG         imatinib                            0.95

✨ Why OpenMed?

  • Specialized Models: 12+ curated medical NER models outperforming proprietary solutions
  • HIPAA-Compliant PII Detection: Smart de-identification with all 18 Safe Harbor identifiers
  • One-Line Deployment: From prototype to production in minutes
  • Interactive TUI: Beautiful terminal interface for rapid experimentation
  • Batch Processing: Multi-file workflows with progress tracking
  • Production-Ready: Configuration profiles, profiling tools, and medical-aware tokenization
  • Zero Lock-In: Apache 2.0 licensed, runs on your infrastructure

Quick Start

Installation

# Install with Hugging Face support
pip install openmed[hf]

# Or try the interactive TUI
pip install openmed[tui]

Three Ways to Use OpenMed

1️⃣ Python API — One-liner for scripts and notebooks

from openmed import analyze_text

result = analyze_text(
    "Patient received 75mg clopidogrel for NSTEMI.",
    model_name="pharma_detection_superclinical"
)

2️⃣ Interactive TUI — Visual workbench for exploration

openmed  # Launch the TUI directly

TUI Screenshot

3️⃣ CLI Automation — Batch processing for production

# Process a directory of clinical notes
openmed batch --input-dir ./notes --output results.json

# Use configuration profiles
openmed config profile-use prod

Interactive Terminal Interface

The OpenMed TUI provides a full-featured workbench that runs in any terminal:

  • Real-time entity extraction with Ctrl+Enter
  • Color-coded entity highlighting
  • Live configuration tuning (threshold, grouping, tokenization)
  • Confidence visualization with progress bars
  • Analysis history and export (JSON, CSV)
  • Hot-swappable models and profiles
  • File browser for batch analysis
# Launch with custom settings
openmed tui --model disease_detection_superclinical --confidence-threshold 0.7

📖 Full TUI Documentation


Key Features

Core Capabilities

  • Curated Model Registry: Metadata-rich catalog with 12+ specialized medical NER models
  • PII Detection & De-identification: HIPAA-compliant de-identification with smart entity merging
  • Medical-Aware Tokenization: Clean handling of clinical patterns (COVID-19, CAR-T, IL-6)
  • Advanced NER Processing: Confidence filtering, entity grouping, and span alignment
  • Multiple Output Formats: Dict, JSON, HTML, CSV for any downstream system

Production Tools (v0.4.0)

  • Batch Processing: Multi-text and multi-file workflows with progress tracking
  • Configuration Profiles: dev/prod/test/fast presets with flexible overrides
  • Performance Profiling: Built-in inference timing and bottleneck analysis
  • Interactive TUI: Rich terminal UI for rapid iteration

Documentation

Comprehensive guides available at openmed.life/docs

Quick links:


Models

OpenMed includes a curated registry of 12+ specialized medical NER models:

Model Specialization Entity Types Size
disease_detection_superclinical Disease & Conditions DISEASE, CONDITION, DIAGNOSIS 434M
pharma_detection_superclinical Drugs & Medications DRUG, MEDICATION, TREATMENT 434M
pii_detection_superclinical PII & De-identification NAME, DATE, SSN, PHONE, EMAIL, ADDRESS 434M
anatomy_detection_electramed Anatomy & Body Parts ANATOMY, ORGAN, BODY_PART 109M
gene_detection_genecorpus Genes & Proteins GENE, PROTEIN 109M

📖 Full Model Catalog


Advanced Usage

PII Detection & De-identification (v0.5.0)

from openmed import extract_pii, deidentify

# Extract PII entities with smart merging (default)
result = extract_pii(
    "Patient: John Doe, DOB: 01/15/1970, SSN: 123-45-6789",
    model_name="pii_detection_superclinical",
    use_smart_merging=True  # Prevents entity fragmentation
)

# De-identify with multiple methods
masked = deidentify(text, method="mask")        # [NAME], [DATE]
removed = deidentify(text, method="remove")     # Complete removal
replaced = deidentify(text, method="replace")   # Synthetic data
hashed = deidentify(text, method="hash")        # Cryptographic hashing
shifted = deidentify(text, method="shift_dates", date_shift_days=180)

Smart Entity Merging (NEW in v0.5.0): Fixes tokenization fragmentation by merging split entities like dates (01/15/1970 instead of 01 + /15/1970), ensuring production-ready de-identification.

HIPAA Compliance: Covers all 18 Safe Harbor identifiers with configurable confidence thresholds.

📓 Complete PII Notebook | 📖 Documentation

Batch Processing

# Process multiple files with progress tracking
openmed batch --input-dir ./clinical_notes --pattern "*.txt" --recursive

# Use profiles for different environments
openmed config profile-use prod
openmed batch --input-files note1.txt note2.txt --output results.json

Configuration Profiles

from openmed import analyze_text

# Apply a profile programmatically
result = analyze_text(
    text,
    model_name="disease_detection_superclinical",
    config_profile="prod"  # High confidence, grouped entities
)

Performance Profiling

from openmed import analyze_text, profile_inference

with profile_inference() as profiler:
    result = analyze_text(text, model_name="disease_detection_superclinical")

print(profiler.summary())  # Inference time, bottlenecks, recommendations

📖 More Examples


Contributing

We welcome contributions! Whether it's bug reports, feature requests, or pull requests.


License

OpenMed is released under the Apache-2.0 License.


Citation

If you use OpenMed in your research, please cite:

@misc{panahi2025openmedneropensourcedomainadapted,
      title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},
      author={Maziyar Panahi},
      year={2025},
      eprint={2508.01630},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.01630},
}

Star History

If you find OpenMed useful, consider giving it a star ⭐ to help others discover it!


Built with ❤️ by the OpenMed team

🌐 Website📚 Documentation🐦 X/Twitter💬 LinkedIn