microbiorust-py

microbiorust 🦀

Python bindings for microBioRust — a high-performance, modular bioinformatics toolkit written in Rust.

microbiorust provides fast and memory-efficient bioinformatics functionality to Python users by leveraging the power of Rust, exposed through PyO3. This package aims to offer an alternative to libraries like Biopython, with a focus on speed, correctness, and extensibility.

Installation

pip install microbiorust

Wheels are available for Linux, macOS and Windows (Python 3.10+). No Rust toolchain required. (no requirement to install Rust)

Build from source

If you prefer to build from source using maturin:

pip install maturin
git clone https://github.com/microBioRust/microBioRust
cd microbiorust-py
maturin develop --features extension-module

To verify the Python module functions are correctly exposed from Rust:

cargo test

Features

Fast parsers for GenBank and EMBL formats
Fast parsers for BLAST XML and tabular formats
Fast parser for MSA alignments — subset, get_consensus
Output to GFF3, FAA and FFN formats
Accurate feature extraction and translation
Sequence metrics: hydrophobicity, amino acid counts and percentages
Seamless Python API for easy integration into existing pipelines
Built with Rust for memory safety and performance

Modules

`microbiorust gbk` — GenBank format

from microbiorust import gbk

# Extract protein sequences to FASTA
gbk.gbk_to_faa("input.gbk", "output.faa")

# Extract nucleotide sequences to FASTA
gbk.gbk_to_fna("input.gbk", "output.fna")

# Count protein sequences
count = gbk.gbk_to_faa_count("input.gbk")

# Convert annotations to GFF3
gbk.gbk_to_gff("input.gbk", "output.gff")

`microbiorust embl` — EMBL format

from microbiorust import embl

# Extract protein sequences to FASTA
embl.embl_to_faa("input.embl", "output.faa")

# Extract nucleotide sequences to FASTA
embl.embl_to_fna("input.embl", "output.fna")

# Convert annotations to GFF3
embl.embl_to_gff("input.embl", "output.gff")

`microbiorust seqmetrics` — Sequence metrics

from microbiorust import seqmetrics

sequence = "MKTLLLTLVVVTIVCLDLGAVGNGSSLSEDKDNVHK"

# Hydrophobicity score
window_size = 5
score = seqmetrics.hydrophobicity(sequence, window_size)

# Amino acid counts
counts = seqmetrics.amino_counts(sequence)

# Amino acid percentages
percentages = seqmetrics.amino_percentage(sequence)

`microbiorust align` — Multiple sequence alignment

from microbiorust import align

# Subset a fasta format MSA by row and column e.g.
align.subset_msa_alignment("input.fasta", "ids.txt", "output.fasta")
where the first tuple (0,10) is a row-wise subset and
the second tuple (0,100) is a column-wise subset

Why Rust?

Rust gives microbiorust C-level performance with memory safety — no segfaults, no GIL limitations, and no need for NumPy or Pandas for core parsing operations. Large GenBank or EMBL files are parsed significantly faster than equivalent pure-Python implementations.

Documentation

Full documentation: https://microbiorust.github.io/docs/

Source: https://github.com/microBioRust/microBioRust

License

MIT

Name		Name	Last commit message	Last commit date
parent directory ..
benchmarks		benchmarks
microbiorust		microbiorust
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
K12_ribo.gbk		K12_ribo.gbk
README.md		README.md
Rhiz3841.gbk.gb		Rhiz3841.gbk.gb
asv.conf.json		asv.conf.json
bp_gbk2faa.py		bp_gbk2faa.py
config.toml		config.toml
example.embl		example.embl
flamegraph.svg		flamegraph.svg
pyproject.toml		pyproject.toml
rust_via_python_countgbk2faa.py		rust_via_python_countgbk2faa.py
rust_via_python_gbk2faa.py		rust_via_python_gbk2faa.py
rust_via_python_writefaa.py		rust_via_python_writefaa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

microbiorust 🦀

Installation

Build from source

Features

Modules

`microbiorust gbk` — GenBank format

`microbiorust embl` — EMBL format

`microbiorust seqmetrics` — Sequence metrics

`microbiorust align` — Multiple sequence alignment

Why Rust?

Documentation

License

Uh oh!

FilesExpand file tree

microbiorust-py

Directory actions

More options

Directory actions

More options

Latest commit

History

microbiorust-py

Folders and files

parent directory

README.md

microbiorust 🦀

Installation

Build from source

Features

Modules

microbiorust gbk — GenBank format

microbiorust embl — EMBL format

microbiorust seqmetrics — Sequence metrics

microbiorust align — Multiple sequence alignment

Why Rust?

Documentation

License

`microbiorust gbk` — GenBank format

`microbiorust embl` — EMBL format

`microbiorust seqmetrics` — Sequence metrics

`microbiorust align` — Multiple sequence alignment