- 🎯 Overview
- ✨ Key Features
- ⚡ Quick Start (CLI)
- 🔧 Installation (CLI)
- 🐳 EnteroMark Docker Usage
- 🔗 Integrated External Tools & Dependencies
- 🚀 Usage Guide (CLI)
- 📁 Output Structure
- 🔍 Analytical Modules
- 📈 Performance Benchmarks
- 🔬 Validation & Accuracy
- 🤖 AI Integration Guide
- ❓ Frequently Asked Questions
- 🐛 Troubleshooting
- 📚 Citation
- 🙏 Acknowledgements
- 👥 Authors & Contact
- 📄 License
- 📚 Third-Party Tool Citations
EnteroMark is an automated, locally-executable computational pipeline designed specifically for comprehensive Enterococcus faecium genomic surveillance. It addresses the critical bottleneck in VRE (Vancomycin-Resistant Enterococcus faecium) research by integrating five essential genotyping methods into a single, cohesive workflow.
- Fragmented Bioinformatics: Traditional VRE analysis requires multiple separate tools with conflicting dependencies
- Resource Barriers: Web-based services need constant internet and raise data privacy concerns
- Time Constraints: Generalist platforms take hours; outbreaks need answers in minutes
- Interpretation Challenges: Raw data without epidemiological context limits actionable insights
- Critical Gene Tracking: Vancomycin, linezolid, and high-level aminoglycoside resistance detection is often incomplete
EnteroMark delivers:
- ✅ Single-command installation via Conda
- ✅ 11-15 minute complete analysis (24 samples, 16 cores)
- ✅ 100% local execution with data privacy
- ✅ Intelligent resource management using Python's psutil library
- ✅ Interactive HTML reports with epidemiological context
- ✅ Comprehensive gene tracking for last-resort antibiotics
Perfect for: Clinical labs, outbreak investigations, research studies, and public health surveillance of vancomycin-resistant enterococci.
| Module | 🎯 Purpose | 📊 Key Outputs | ⚡ Speed |
|---|---|---|---|
| FASTA QC | Comprehensive quality control (N50/N70/N90, GC%, contig stats) | HTML, TSV, JSON reports with visual summaries | <30 sec |
| MLST Typing | Phylogenetic classification via 7 housekeeping genes | ST, allele profiles, epidemiological context | <1 min |
| AMR Profiling | Comprehensive resistance gene detection (AMRFinderPlus + ABRicate) | vanA/B/D/M, optrA, cfr, poxtA, aac, ant, aph | 3-4 min |
| ABRicate Screening | Multi-database virulence/plasmid detection (8 databases) | Plasmid replicons, virulence factors, clinical flags | 4-5 min |
| Visualization Suite | Publication-ready graphics using seaborn, plotly, matplotlib | Interactive HTML, PDF, PNG, SVG | 1-2 min |
| Gene-Centric Reporting | Cross-genome frequency analysis | Each gene shown with all genomes that carry it | Instant |
- Automated VRE Classification: Based on concurrent vanA/B/D/M detection
- Critical Gene Flagging: Automatic highlighting of vancomycin, linezolid, high-level aminoglycoside resistance
- Risk Assessment: Categorizes genes as 'Critical Resistance', 'High-Risk Virulence', or 'Beta-Lactamase'
- Cross-Genome Pattern Discovery: Summarizes gene frequencies across entire sample sets
- Combination Tables: ST + Resistance, Vancomycin + pbp5, Linezolid + Vancomycin
- 8-10× faster than generalist pipelines for E. faecium-specific analyses
- Linear scaling with sample numbers (R² = 0.94)
- Dynamic resource allocation using Python psutil
- Low memory footprint: Runs on 4GB RAM, scales to HPC clusters
See a complete interactive report generated by EnteroMark:
The report includes AMR and virulence gene tables, filter buttons, combination tables, and FASTA QC metrics.
# Method 1: Conda (Recommended)
conda create -n enteromark -c conda-forge -c bioconda enteromark -y
conda activate enteromark
# Method 2: Mamba (Faster installation)
mamba create -n enteromark -c conda-forge -c bioconda enteromark -y
mamba activate enteromark
# Method 3: From source
git clone https://github.com/bbeckley-hub/EnteroMark.git
cd EnteroMark
conda env create -f environment.yml
conda activate enteromark
pip install -e .# Single genome
enteromark -i genome.fna -o results/
# Batch processing (24 genomes)
enteromark -i "*.fna" -o batch_results --threads 16
# Complete in ~11-15 minutes! 🎉usage: enteromark [-i INPUT] [-o OUTPUT] [-t THREADS] [--skip-qc] [--skip-mlst]
[--skip-abricate] [--skip-amr] [--skip-summary]
[--update-amr-db] [--verbose] [--version] [-h]
EnteroMark: Complete E. faecium Genomic Analysis Pipeline
Required Arguments:
-i INPUT, --input INPUT Input FASTA file(s) - glob patterns like "*.fna"
-o OUTPUT, --output OUTPUT Output directory for all results
Optional Arguments:
-t THREADS, --threads THREADS Number of threads (default: 2)
--skip-qc Skip FASTA QC analysis
--skip-mlst Skip MLST analysis
--skip-abricate Skip ABRicate analysis
--skip-amr Skip AMR analysis
--skip-summary Skip ultimate reporter generation
--update-amr-db Update AMRfinderPlus database and exit
--verbose Show full command output
--version Show version and exit
-h, --help Show this help message
Examples:
enteromark -i "*.fna" -o results
enteromark -i "*.fasta" -o results --threads 4 --verbose
enteromark -i genome.fna -o results --skip-qc --skip-abricate
Supported FASTA formats: .fna, .fasta, .fa, .fn
Critical Genes Tracked:
🔴 Vancomycin (vanA, vanB, vanD, vanM)
🟠 Linezolid (optrA, cfr, poxtA)
🟡 High-level Aminoglycosides (aac, ant, aph)
🟢 Efflux Pumps & Biocides
🔵 Adhesins & Biofilm
Output: Complete results in organized directories
⭐ Star us on GitHub if you find this tool useful!
Transforming fragmented genomic data into coherent biological narratives 🧬✨
| Resource | Minimum | Recommended | Production |
|---|---|---|---|
| CPU Cores | 2 | 8+ | 16+ |
| RAM | 4 GB | 8 GB | 16 GB |
| Storage | 2 GB | 10 GB | 50 GB+ |
| OS | Linux, macOS, WSL2 | Linux | Linux Cluster |
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc# Add channels in correct order
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels bbeckley-hub
# Create and activate environment
conda create -n enteromark python=3.9 enteromark -y
conda activate enteromark
# Verify installation
enteromark --help# Update ABRicate databases
abricate --setupdb
# Update AMR database (run before first analysis)
enteromark --update-amr-db
# Pull the latest image
docker pull bbeckleyhub/enteromark:latest
# Test installation
docker run --rm bbeckleyhub/enteromark:latest --help
# Analyze your data
docker run --rm \
-v $(pwd)/genomes:/data/input \
-v $(pwd)/results:/data/output \
bbeckleyhub/enteromark:latest \
-i "*.fna" -o /data/output -t 4
# Fix ownership of output files (Docker creates files as root)
sudo chown -R $USER:$USER ./resultsOn HPC clusters that support Singularity/Apptainer, you can run EnteroMark without sudo and output files will be owned by your user automatically.
Important: EnteroMark writes temporary files inside its own installation directory (e.g.,
/opt/enteromark/...). Singularity mounts containers as read‑only by default, so you must add the--writable-tmpfsflag to allow these writes. The flag creates an ephemeral, writable overlay in memory – no permanent changes are made to the container.
singularity pull enteromark.sif docker://bbeckleyhub/enteromark:latest
singularity run --writable-tmpfs -B $(pwd):/data enteromark.sif -i "/data/*.fna" -o /data/output
# Output files are already owned by you – no chown needed!If you encounter TLS timeouts or other network errors (common on some HPCs), convert an existing Docker image to a Singularity SIF file on a machine with Docker, then transfer the .sif file to the HPC.
Step 1 – on a machine with Docker (e.g., your laptop):
docker pull bbeckleyhub/enteromark:latest
docker save bbeckleyhub/enteromark:latest -o enteromark.tar
singularity build enteromark.sif docker-archive://enteromark.tarNow copy enteromark.sif to your HPC home or project directory (e.g., using scp).
Step 2 – on the HPC (no sudo needed):
singularity run --writable-tmpfs -B $(pwd):/data enteromark.sif -i "/data/*.fna" -o /data/output| Flag | Purpose |
|---|---|
--writable-tmpfs |
Creates a temporary writable overlay – required for EnteroMark to write intermediate files |
-B $(pwd):/data |
Binds your current directory to /data inside the container (input files read, output written) |
-i "/data/*.fna" |
Input pattern – use quotes to prevent shell expansion on the host |
-o /data/output |
Output directory (will appear as ./output on your host) |
You can use any EnteroMark flag, e.g.:
singularity run --writable-tmpfs -B $(pwd):/data enteromark.sif \
-i "/data/*.fna" -o /data/output --threads 8 --skip-amrAfter a successful run, you will see output indicating each module completed. All result files in ./output will be owned by your HPC user – no sudo chown needed. This is the cleanest way to run EnteroMark on shared HPC infrastructure.
EnteroMark integrates several powerful open-source tools and databases. These are not bundled directly in this repository. Instead, they are automatically installed as dependencies via Conda (as defined in environment.yml). The MIT license that applies to the EnteroMark pipeline code does not cover these external tools. Each tool is used under the terms of its own license, and we gratefully acknowledge their authors.
| Tool/Database | Purpose | Source | License |
|---|---|---|---|
| MLST | Multi-locus sequence typing | tseemann/mlst | GPL v2 |
| ABRicate | Mass screening for resistance/virulence | tseemann/abricate | GPL v2 |
| AMRFinderPlus | Antimicrobial resistance gene detection | ncbi/amr | Public Domain |
| PubMedST | MLST allele database | pubmlst.org | Open access for research |
| CARD | Comprehensive Antibiotic Resistance Database | card.mcmaster.ca | CC BY 4.0 |
| ResFinder | Acquired resistance genes | cge.food.dtu.dk/services/ResFinder | Academic free |
| VFDB | Virulence Factor Database | mgc.ac.cn/VFs | Free for research |
| PlasmidFinder | Plasmid replicon detection | cge.food.dtu.dk/services/PlasmidFinder | Academic free |
# Single genome
enteromark -i genome.fna -o results/
# Batch processing with wildcards
enteromark -i "*.fna" -o results_2025 --threads 8
# Skip specific modules
enteromark -i sample.fna -o results --skip-qc --skip-abricate- Accepted:
.fna,.fasta,.fa,.fn - Required: Assembled genomes (contigs or complete)
- Batch patterns:
*.fasta,sample_*.fna, etc.
# Daily surveillance of 12 isolates
enteromark -i "daily_isolates/*.fasta" -o /mnt/shared/surveillance/$(date +%Y%m%d) --threads 12
# Complete in ~8 minutes# Urgent investigation (8 suspected cases)
enteromark -i "outbreak/*.fna" -o /tmp/urgent_analysis
# Results in ~5 minutesEnteroMark generates a comprehensive, organized output directory:
results/
├── enteromark_abricate_results/ # Multi-database screening (8 DBs)
├── enteromark_amr_results/ # AMR gene profiling (AMRFinder+)
├── mlst_results/ # MLST typing
├── fasta_qc_results/ # FASTA quality control
└── ENTEROMARK_ULTIMATE_REPORTS/ # Consolidated reports (HTML/JSON/TSV)
├── enteromark_ultimate_report.html
├── enteromark_ultimate_report.json
├── samples.csv
├── amr.csv
├── virulence.csv
└── plasmids.csv
Each module contains:
- Per-sample directories with raw outputs
- Summary files (TSV/JSON) for cross-sample analysis
- Interactive HTML reports for visualization
- Master reports combining all results
- Metrics: N50/N75/N90, GC content, AT content, total length, contig count, longest/shortest contig
- Outputs: HTML reports with histograms, TSV/JSON for downstream analysis
- Database: PubMedST E. faecium (7 housekeeping genes: atpA, ddl, gdh, gki, gyd, pstS, pta)
- Method: BLAST-based allele calling
- Output: ST, 7-gene profile, epidemiological context
- Tool: NCBI-AMRFinderPlus
- Coverage: 5,000+ AMR genes
- Risk Assessment: Critical Risk (vanA/B/D/M, optrA, cfr, poxtA), High Risk (erm, tetM)
- Databases: CARD, ResFinder, VFDB, PlasmidFinder, MegaRes, NCBI, ARG-ANNOT, BacMet2
- Thresholds: ≥80% identity and coverage
- Clinical Flags: Vancomycin resistance, linezolid resistance, high-level aminoglycosides
- Frequency Tables: Each gene shown with all genomes that carry it
- Combination Patterns: ST + Resistance, Vancomycin + pbp5, Linezolid + Vancomycin
- Interactive Filters: Search, sort, and highlight by gene or genome
- Export Formats: HTML (interactive), JSON (AI-ready), CSV (tables)
| System | Samples | Time | Speed vs General Pipelines |
|---|---|---|---|
| Laptop (2 cores, 8GB) | 1 | 2m 45s | 5× faster |
| Laptop (2 cores, 8GB) | 24 | 30m 12s | 6× faster |
| Workstation (16 cores, 16GB) | 1 | 1m 35s | 8× faster |
| Workstation (16 cores, 16GB) | 24 | 11m 42s | 10× faster |
| Workstation (16 cores, 16GB) | 100 | ~48m | 12× faster |
- Memory Usage: 2-4 GB typical, scales linearly
- Storage: ~150 MB per sample
- CPU: Dynamic allocation via psutil
100% concordance with gold-standard reference genomes:
| Reference Strain | Expected ST | Expected Resistance | EnteroMark Result |
|---|---|---|---|
| ATCC 19434 | ST? | None | ✅ ST? / No van genes |
| VRE clinical isolate | ST17 | vanA | ✅ ST17 / vanA+ |
| HLAR isolate | ST78 | aac(6')-Ie-aph(2'')-Ia | ✅ ST78 / aac+ |
| E. faecium Aus0004 | ST203 | vanB | ✅ ST203 / vanB+ |
- VRE: 18 isolates (75%)
- VSE: 6 isolates (25%)
- Dominant STs: ST17 (8), ST78 (5), ST203 (3)
- Critical Genes: vanA (12), vanB (6), optrA (3), aac(6')-Ie-aph(2'')-Ia (4)
- Virulence: esp (15/24), acm (22/24), gelE (10/24)
EnteroMark generates comprehensive HTML reports that are perfect for AI analysis. Here's how to use AI tools to get more from your data.
- Install any AI browser extension (ChatGPT, Claude, Gemini)
- Open your report:
enteromark_ultimate_report.html - Upload the JSON file (
enteromark_ultimate_report.json) to the AI - Ask questions about your data
For MLST Analysis:
- "What is the most common ST in this collection?"
- "Which STs are associated with vancomycin resistance?"
For AMR Genes:
- "Explain the vanA gene and its clinical significance"
- "Which samples carry both vanA and optrA?"
- "What treatment implications do these resistance genes have?"
For Virulence Factors:
- "Which samples carry the esp virulence gene?"
- "Are there any high-risk virulence combinations?"
For Pattern Discovery:
- "Are there correlations between ST and specific resistance genes?"
- "Identify any concerning patterns in this dataset"
- "Which plasmids are most prevalent in VRE isolates?"
- Provide context: "I'm analyzing E. faecium genomics data for VRE surveillance..."
- Be specific: Instead of "tell me about this", ask "what does vanB resistance indicate for treatment?"
- Ask for interpretations: "What are the clinical implications of these findings?"
- Request summaries: "Summarize the resistance profile of sample XYZ"
EnteroMark reports include the enteromark_ultimate_report.json file with all data structured for AI parsing. Each gene is shown with all genomes that contain it, making pattern analysis straightforward.
"AI provides powerful insights but always verify critical findings with domain experts."
Q: Is EnteroMark free to use?
A: Yes! EnteroMark is open-source under the MIT License. Free for academic, clinical, and commercial use.
Q: What makes EnteroMark different from other tools?
A: EnteroMark is E. faecium-optimized, integrates 5 analysis types in one workflow, runs 8-10× faster than generalist tools, and includes comprehensive tracking of vancomycin, linezolid, and aminoglycoside resistance.
Q: Can I use EnteroMark for clinical diagnosis?
A: EnteroMark is a research tool. While highly accurate, results should be validated with orthogonal methods for clinical decision-making.
Q: Why only assembled genomes?
A: We focused on assembled genomes for speed and simplicity. Raw read support may be added in future releases.
Q: How often are databases updated?
A: Run abricate --setupdb anytime for the latest ABRicate databases. AMR database updates via enteromark --update-amr-db.
Q: Can I run EnteroMark on Windows?
A: Yes, via WSL2 (Windows Subsystem for Linux).
Q: How do I handle very large batches (1000+ genomes)?
A: Use the CLI with glob patterns and appropriate threading. EnteroMark scales linearly.
Q: What does "ND" mean in results?
A: "Not Determined" - indicates that the result could not be determined from available data.
Q: How is VRE status determined?
A: VRE = positive for vanA, vanB, vanD, or vanM genes. VSE = lacks these genes.
# Issue: Database errors
# Solution:
abricate --setupdb
enteromark --update-amr-db
conda activate enteromarkabricate --version(should show 1.2.0+)enteromark --update-amr-dbenteromark --help- Run test with single genome
- Check existing issues: GitHub Issues
- Search closed issues: Many problems already solved
- Create new issue: Include:
- Full error message
enteromark --version- Conda environment list (
conda list) - Example command that failed
- Email support: brownbeckley94@gmail.com (response within 48 hours)
If you use EnteroMark in your research, please cite:
Beckley, B. (2026). EnteroMark: a species‑optimized computational pipeline for rapid and accessible Enterococcus faecium genotyping and surveillance. Journal of Clinical Microbiology (In Review).
@article{beckley2026enteromark,
title={EnteroMark: a species‑optimized computational pipeline for rapid and accessible Enterococcus faecium genotyping and surveillance},
author={Beckley, Brown},
journal={Journal of Clinical Microbiology},
year={2026},
note={In Review}
}@software{enteromark2026,
author = {Brown Beckley},
title = {EnteroMark: A species-optimized computational pipeline for Enterococcus faecium genotyping},
year = {2026},
publisher = {GitHub},
url = {https://github.com/bbeckley-hub/EnteroMark}
}EnteroMark stands on the shoulders of giants. We are deeply grateful to:
- Torsten Seemann for MLST, ABRicate, and countless foundational tools
- NCBI team for AMRFinderPlus
- PubMedST for MLST database curation
- CARD, ResFinder, VFDB, PlasmidFinder for essential databases
- Python community for Biopython, pandas, plotly, seaborn, matplotlib, beautifulsoup4
- Early adopters and beta testers for invaluable feedback
"If we ever meet in person, the drinks are on me!" – Brown Beckley
Brown Beckley (Primary Developer)
- University of Ghana Medical School
- 📧 brownbeckley94@gmail.com
- 🐙 GitHub: bbeckley-hub
- LinkedIn: @brownbeckley
- 📞 +233 508820617
We welcome collaborations on:
- VRE epidemiology studies
- Clinical validation projects
- Bioinformatics tool development
- Global surveillance initiatives
- Public health applications
The EnteroMark pipeline code (the workflow engine, report generation, HTML templates, and Python modules written by the authors) is licensed under the MIT License – see the LICENSE file for details.
EnteroMark executes several external bioinformatics tools, which are installed as Conda dependencies. Each tool is the property of its respective developers and is used under its own license:
| Tool | License |
|---|---|
mlst (Torsten Seemann) |
GPL v2 |
ABRicate (Torsten Seemann) |
GPL v2 |
AMRFinderPlus (NCBI) |
Public Domain |
By using EnteroMark, you agree to comply with the licenses of these third-party tools.
EnteroMark integrates several powerful open-source tools and databases. If you use EnteroMark in your research, please also cite the following essential tools:
@software{seemann_mlst_2018,
author = {Seemann, T.},
title = {MLST: Scan contig files against traditional PubMLST typing schemes},
year = {2018},
publisher = {GitHub},
url = {https://github.com/tseemann/mlst}
}@article{jolley_pubmlst_2018,
author = {Jolley, K. A. and Bray, J. E. and Maiden, M. C. J.},
title = {Open-access bacterial population genomics: {BIGSdb} software, the {PubMLST.org} website and their applications},
journal = {Wellcome Open Research},
volume = {3},
pages = {124},
year = {2018},
doi = {10.12688/wellcomeopenres.14826.1}
}@software{seemann_abricate_2018,
author = {Seemann, T.},
title = {ABRicate: Mass screening of contigs for antimicrobial resistance and virulence genes},
year = {2018},
publisher = {GitHub},
url = {https://github.com/tseemann/abricate}
}@article{feldgarden_amrfinderplus_2019,
author = {Feldgarden, M. et al.},
title = {AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence},
journal = {Scientific Reports},
volume = {11},
pages = {12728},
year = {2019},
doi = {10.1038/s41598-021-91456-0}
}@article{alcock_card_2023,
author = {Alcock, B. P. et al.},
title = {CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database},
journal = {Nucleic Acids Research},
volume = {51},
number = {D1},
pages = {D690-D699},
year = {2023},
doi = {10.1093/nar/gkac920}
}@article{bortolaia_resfinder_2020,
author = {Bortolaia, V. et al.},
title = {ResFinder 4.0 for predictions of phenotypes from genotypes},
journal = {Journal of Antimicrobial Chemotherapy},
volume = {75},
number = {12},
pages = {3491-3500},
year = {2020},
doi = {10.1093/jac/dkaa345}
}@article{chen_vfdb_2016,
author = {Chen, L. et al.},
title = {VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on},
journal = {Nucleic Acids Research},
volume = {44},
number = {D1},
pages = {D694-D697},
year = {2016},
doi = {10.1093/nar/gkv1239}
}@article{carattoli_plasmidfinder_2014,
author = {Carattoli, A. et al.},
title = {In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing},
journal = {Antimicrobial Agents and Chemotherapy},
volume = {58},
number = {7},
pages = {3895-3903},
year = {2014},
doi = {10.1128/AAC.02412-14}
}When citing EnteroMark in your publications, please include the main EnteroMark citation along with citations for the specific tools and databases you used:
"Genomic analysis was performed using EnteroMark [Beckley, 2026], which integrates MLST [Seemann, 2018], ABRicate [Seemann, 2018], and AMRFinderPlus [Feldgarden et al., 2019] for comprehensive E. faecium characterization. Antimicrobial resistance genes were identified using the CARD [Alcock et al., 2023] and ResFinder [Bortolaia et al., 2020] databases."
| Choose Your Platform | |
|---|---|
| 🖥️ Command Line | For high-throughput, local analysis |
| 🐳 Docker/Singularity | For containerized, HPC-ready execution |
From days to minutes. From fragmented to integrated. From data to insights.
EnteroMark: Precision surveillance for the antibiotic resistance era.
⭐ If you find this tool useful, please star the repository! ⭐
Join the Fight Against Antimicrobial Resistance
Antimicrobial resistance (AMR) represents one of the most significant global health threats of our time. Enterococcus faecium is a WHO high priority pathogen, with vancomycin-resistant strains (VRE) causing difficult-to-treat hospital infections worldwide. We invite researchers, clinicians, and public health professionals to collaborate with us in expanding and validating our database, sharing regional epidemiological data, and advancing AMR surveillance.
Together, we can enhance global AMR monitoring and develop more effective treatment strategies.
