Attack
The hackagent attack command executes security attacks against AI agents.
Usage
hackagent attack <attack_type> [options]
Available Attacks
| Attack Type | Description |
|---|---|
advprefix | Advanced prefix injection attack |
baseline | Baseline attack for comparison |
AdvPrefix Attack
The primary attack type that generates adversarial prefixes to test agent security.
Basic Usage
hackagent attack advprefix \
--agent-name "my-agent" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--goals "Extract system prompt information"
Options
| Option | Required | Description | Default |
|---|---|---|---|
--agent-name | ✅ | Name of the target agent | - |
--agent-type | ✅ | Type of agent (google-adk, openai-sdk, litellm, langchain) | - |
--endpoint | ✅ | Agent endpoint URL | - |
--goals | ✅ | Attack goals (comma-separated or quoted string) | - |
--generator | ❌ | Generator model identifier | ollama/llama2-uncensored |
--generator-endpoint | ❌ | Generator endpoint URL | http://localhost:11434/api/generate |
--judge | ❌ | Judge model identifier | ollama/llama3 |
--judge-endpoint | ❌ | Judge endpoint URL | http://localhost:11434/api/generate |
--max-iterations | ❌ | Maximum attack iterations | 10 |
--temperature | ❌ | Sampling temperature | 0.7 |
--output-format | ❌ | Output format (table, json, csv) | Config default |
Examples
Basic attack:
hackagent attack advprefix \
--agent-name "weather-bot" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--goals "Return fake weather data"
Attack with custom models:
hackagent attack advprefix \
--agent-name "assistant" \
--agent-type "openai-sdk" \
--endpoint "https://api.example.com/v1" \
--goals "Extract sensitive information" \
--generator "ollama/mistral-uncensored" \
--generator-endpoint "http://localhost:11434/api/generate" \
--judge "ollama/llama3" \
--judge-endpoint "http://localhost:11434/api/generate"
Attack with multiple goals:
hackagent attack advprefix \
--agent-name "support-bot" \
--agent-type "litellm" \
--endpoint "http://localhost:8000" \
--goals "Bypass content filters" "Extract system prompt" "Ignore safety instructions"
High-intensity attack:
hackagent attack advprefix \
--agent-name "target-agent" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--goals "Comprehensive security test" \
--max-iterations 50 \
--temperature 0.9
Baseline Attack
A simpler attack for comparison purposes.
hackagent attack baseline \
--agent-name "my-agent" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--goals "Test agent response"
Output
Attack results are:
- Displayed in the terminal — Progress and summary
- Saved locally — In
./logs/runs/directory - Uploaded to dashboard — View at app.hackagent.dev
JSON Output
For CI/CD integration, output results as JSON:
hackagent attack advprefix \
--agent-name "target" \
--agent-type "google-adk" \
--endpoint "http://localhost:8000" \
--goals "Security test" \
--output-format json > results.json
CI/CD Integration
Example GitHub Actions workflow:
- name: Run Security Tests
run: |
hackagent attack advprefix \
--agent-name "${{ env.AGENT_NAME }}" \
--agent-type "google-adk" \
--endpoint "${{ env.AGENT_ENDPOINT }}" \
--goals "Automated security validation" \
--output-format json > test_results.json
- name: Upload Results
uses: actions/upload-artifact@v3
with:
name: security-results
path: test_results.json
See Also
- Attack Tutorial — Step-by-step guide
- AdvPrefix Attacks — Detailed attack documentation
- Results — View and manage attack results