Fast PII detection and anonymization, built in Rust with Python and WASM bindings.
DataFog detects structured PII (emails, phone numbers, SSNs, credit cards, IPs, dates of birth, ZIP codes) using compiled regex patterns, and optionally detects soft PII (names, organizations, addresses) using a GLiNER ONNX model via NER. The regex engine runs in microseconds per kilobyte with zero external dependencies.
DataFog is a Rust library with Python bindings built by maturin. You need a Rust toolchain to install from source.
Prerequisites: Rust (stable) and Python 3.9+.
# From PyPI (once published)
pip install datafog
# From source
git clone https://github.com/DataFog/datafog.git
cd datafog
pip install .That's it. pip install . compiles the Rust code and installs DataFog as a Python package. No separate Rust build step needed — maturin handles it behind the scenes.
from datafog import DataFog, detect, anonymize_text
# Detect PII
entities = detect("Contact john@example.com or call 555-123-4567")
# [{"type": "EMAIL", "value": "john@example.com", "start": 8, "end": 24, "score": 1.0},
# {"type": "PHONE", "value": "555-123-4567", "start": 33, "end": 45, "score": 1.0}]
# Anonymize
clean = anonymize_text("SSN is 123-45-6789", method="redact")
# "SSN is [REDACTED]"
# Class API with batch support
fog = DataFog()
results = fog.detect_batch(["john@test.com", "555-123-4567", "no pii here"])Add to your Cargo.toml:
[dependencies]
datafog-core = "0.1"use datafog_core::DataFog;
use datafog_core::anonymizer::AnonymizeMethod;
let fog = DataFog::new();
let result = fog.detect("Contact john@example.com");
println!("{:?}", result.spans);
let anon = fog.anonymize("SSN is 123-45-6789", AnonymizeMethod::Redact);
assert_eq!(anon.text, "SSN is [REDACTED]");The NER engine uses a GLiNER ONNX model to detect soft PII that regex can't catch: person names, organizations, locations, addresses, and more. It runs both regex and NER, then merges the results.
Building with NER pulls in gline-rs + ONNX Runtime (~50 MB binary size increase).
# Install from source with NER + model auto-download
pip install . --config-settings="build-args=--features full"from datafog import DataFog, has_ner_support
if has_ner_support():
fog = DataFog(engine="ner") # downloads ~50 MB model on first use
entities = fog.detect("John Smith works at Acme Corp in Paris")
# detects PERSON, ORGANIZATION, LOCATION in addition to regex entitiesYou can also point to a local model directory:
fog = DataFog(engine="ner", model="/path/to/model/dir")The model directory must contain tokenizer.json and model.onnx.
| Engine | Entity types |
|---|---|
| Regex | EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, DOB, ZIP |
| NER | PERSON, ORGANIZATION, LOCATION, ADDRESS, MEDICAL_RECORD_NUMBER, ACCOUNT_NUMBER, LICENSE_NUMBER, PASSPORT_NUMBER, URL |
When both engines are active, all entity types are available.
from datafog import anonymize_text
anonymize_text("Email: john@test.com", method="redact") # "Email: [REDACTED]"
anonymize_text("Email: john@test.com", method="replace") # "Email: [EMAIL_1]"
anonymize_text("Email: john@test.com", method="hash") # "Email: a1b2c3d4..."Available methods: redact, replace, hash (SHA-256), hash_md5, hash_sha3.
The WASM target provides regex-only detection for browser and edge environments.
# Requires wasm-pack: https://rustwasm.github.io/wasm-pack/installer/
wasm-pack build crates/datafog-wasm --target webA demo page is included at crates/datafog-wasm/demo/index.html.
datafog/
├── crates/
│ ├── datafog-core/ # Pure Rust library (regex, NER, anonymizer, cascade)
│ ├── datafog-python/ # PyO3 bindings
│ └── datafog-wasm/ # wasm-bindgen bindings (regex-only)
├── python/datafog/ # Python package source + type stubs
├── tests/ # Python integration tests
├── pyproject.toml # maturin build config
└── Cargo.toml # Workspace root
| Flag | What it adds | Dependencies |
|---|---|---|
default |
Regex-only detection | None beyond regex, serde |
ner |
GLiNER NER engine | gline-rs, ort |
model-download |
Auto-download models from HuggingFace | reqwest, dirs |
parallel |
Rayon-based batch parallelism | rayon |
ner-cuda |
CUDA GPU acceleration for NER | (implies ner) |
ner-coreml |
Apple CoreML acceleration for NER | (implies ner) |
git clone https://github.com/DataFog/datafog.git
cd datafog
# Rust
cargo test --workspace
cargo clippy --workspace -- -D warnings
cargo fmt --all -- --check
cargo bench --package datafog-core
# Python
python3 -m venv .venv
source .venv/bin/activate
pip install ".[dev]" # installs datafog + maturin + pytest
pytest -vApache 2.0