Attack Techniques

HackAgent provides multiple attack strategies, each designed for different security testing scenarios. Choose the right attack based on your testing goals, time constraints, and target characteristics.

Overview

Available Attacks

Attack	Description	Sophistication	Speed
AdvPrefix	Multi-step adversarial prefix optimization	⭐⭐⭐ High	Slower
PAIR	LLM-driven iterative prompt refinement	⭐⭐ Medium	Medium
Baseline	Template-based prompt injection	⭐ Basic	Fast

Dataset Support

All attacks support loading goals from AI safety benchmarks like AgentHarm, StrongREJECT, and HarmBench. See Dataset Providers for details.

🎯 AdvPrefix — Advanced Prefix Optimization

The most sophisticated attack in HackAgent's arsenal. Uses a 9-step automated pipeline to generate and optimize adversarial prefixes that bypass AI safety mechanisms.

How it works:

Generate → Creates initial attack prefixes using uncensored models
Evaluate → Tests prefixes against target with judge models
Optimize → Selects and refines the most effective prefixes
Report → Provides detailed success metrics and recommendations

Best for: Comprehensive security audits, bypassing sophisticated safety filters, adversarial robustness research

attack_config = {
    "attack_type": "advprefix",
    "goals": ["Extract system prompt"],
    "generator": {"identifier": "ollama/llama2-uncensored", "endpoint": "..."},
    "judges": [{"identifier": "ollama/llama3", "type": "harmbench"}]
}

Learn more about AdvPrefix →

An LLM-powered attack that uses an attacker model to iteratively refine jailbreak prompts based on target responses and judge feedback.

How it works:

Initial prompt → Attacker LLM generates a jailbreak attempt
Target response → Sends prompt to target agent
Score & feedback → Judge evaluates if the attack succeeded
Refine → Attacker uses feedback to generate improved prompt
Iterate → Repeats until success or max iterations

Best for: Black-box testing, adaptive attacks that learn from failures, testing unknown safety mechanisms

Based on: "Jailbreaking Black Box Large Language Models in Twenty Queries" (Chao et al.)

attack_config = {
    "attack_type": "pair",
    "goals": ["Bypass content filter"],
    "attacker": {"identifier": "gpt-4", "endpoint": "https://api.openai.com/v1"},
    "n_iterations": 20
}

Learn more about PAIR →

📝 Baseline — Template-Based Attacks

A simpler but effective approach using predefined prompt templates combined with harmful goals. Great for quick vulnerability assessments.

How it works:

Template selection → Chooses from categorized prompt templates
Goal injection → Combines templates with test objectives
Execution → Sends templated prompts to target agent
Evaluation → Assesses responses using objective criteria

Best for: Quick vulnerability scans, testing basic prompt injection defenses, establishing security baselines

attack_config = {
    "attack_type": "baseline",
    "goals": ["Ignore previous instructions"],
    "template_categories": ["roleplay", "encoding", "context_switch"]
}

Learn more about Baseline →

Choosing the Right Attack

Use AdvPrefix when:

You need comprehensive security coverage
Testing sophisticated safety mechanisms
Time is not a constraint
You want detailed attack analytics

Use PAIR when:

You're testing a black-box system
The safety mechanisms are unknown
You want adaptive, learning-based attacks
You have access to a capable attacker LLM

Use Baseline when:

You need quick results
Running initial vulnerability assessments
Testing basic prompt injection defenses
Establishing a security baseline before deeper testing

Attack Pipeline Architecture

All attacks in HackAgent follow a common architecture pattern:

Components

Orchestrator: Manages attack lifecycle, configuration, and result handling
Attack Implementation: Contains the specific attack logic (AdvPrefix, PAIR, Baseline)
Agent Router: Handles communication with target agents across different frameworks
Judges: Evaluate attack success using various criteria (HarmBench, custom objectives)
Dashboard Sync: Automatically uploads results to the HackAgent platform

Next Steps

AdvPrefix Deep Dive — Full documentation with advanced configuration
PAIR Attack Guide — Iterative refinement techniques
Baseline Templates — Template categories and customization

Overview​

Available Attacks​

🎯 AdvPrefix — Advanced Prefix Optimization​

🔄 PAIR — Prompt Automatic Iterative Refinement​

📝 Baseline — Template-Based Attacks​

Choosing the Right Attack​

Use AdvPrefix when:​

Use PAIR when:​

Use Baseline when:​

Attack Pipeline Architecture​

Components​

Next Steps​

Overview

Available Attacks

🎯 AdvPrefix — Advanced Prefix Optimization

🔄 PAIR — Prompt Automatic Iterative Refinement

📝 Baseline — Template-Based Attacks

Choosing the Right Attack

Use AdvPrefix when:

Use PAIR when:

Use Baseline when:

Attack Pipeline Architecture

Components

Next Steps