Skip to content

Latest commit

 

History

History

README.md

Presidio analyzer

Description

The Presidio analyzer is a Python based service for detecting PII entities in text.

During analysis, it runs a set of different PII Recognizers, each one in charge of detecting one or more PII entities using different mechanisms.

Presidio analyzer comes with a set of predefined recognizers, but can easily be extended with other types of custom recognizers. Predefined and custom recognizers leverage regex, Named Entity Recognition and other types of logic to detect PII in unstructured text.

Language Model-based PII/PHI Detection

Presidio analyzer supports language model-based PII/PHI detection (LLMs, SLMs) for flexible entity recognition. The current implementation uses LangExtract with support for multiple providers:

  • Ollama - Local model deployment for privacy-sensitive environments
  • Azure OpenAI - Cloud-based deployment with enterprise features
pip install presidio-analyzer[langextract]

Quick Usage

Ollama (local models):

from presidio_analyzer.predefined_recognizers import BasicLangExtractRecognizer
recognizer = BasicLangExtractRecognizer()  # Uses default config

Azure OpenAI (cloud models):

from presidio_analyzer.predefined_recognizers import AzureOpenAILangExtractRecognizer

# Simple usage - pass everything as parameters
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",  # Your Azure deployment name
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

# Or use environment variables (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY):
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4"  # Your Azure deployment name
)

# Advanced: Customize entities/prompts with config file
recognizer = AzureOpenAILangExtractRecognizer(
    model_id="gpt-4",
    config_path="./custom_config.yaml",  # Optional: for custom entities/prompts
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-api-key"
)

Note: LangExtract recognizers do not validate connectivity during initialization. Connection errors or missing models will be reported when analyze() is first called.

See the Language Model-based PII/PHI Detection guide for complete setup and usage instructions.

Deploy Presidio analyzer to Azure

Use the following button to deploy presidio analyzer to your Azure subscription.

Deploy to Azure

Simple usage example

from presidio_analyzer import AnalyzerEngine

# Set up the engine, loads the NLP module (spaCy model by default) and other PII recognizers
analyzer = AnalyzerEngine()

# Call analyzer to get results
results = analyzer.analyze(text="My phone number is 212-555-5555",
                           entities=["PHONE_NUMBER"],
                           language='en')
print(results)

GPU Acceleration

For GPU acceleration, install the appropriate dependencies for your hardware:

  • Linux with NVIDIA GPU: cupy-cuda12x (or the version matching your CUDA installation)
  • macOS with Apple Silicon: MPS (Metal Performance Shaders) is currently not supported. The analyzer will use CPU for PyTorch operations.

Documentation

Additional documentation on installation, usage and extending the Analyzer can be found under the Analyzer section of Presidio Documentation