-
-
Notifications
You must be signed in to change notification settings - Fork 763
chore: make meta agent a default strategy #6109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis PR promotes the 'jailbreak:meta' strategy from an additional strategy to a default strategy. The changes include: moving 'jailbreak:meta' from ADDITIONAL_STRATEGIES to DEFAULT_STRATEGIES, moving 'jailbreak' in the opposite direction, updating the metadata description for the meta-agent strategy, and recording the change in CHANGELOG.md. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.md(1 hunks)src/redteam/constants/metadata.ts(1 hunks)src/redteam/constants/strategies.ts(2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
src/redteam/**/*.ts
📄 CodeRabbit inference engine (src/redteam/CLAUDE.md)
src/redteam/**/*.ts: Always sanitize when logging test prompts or model outputs by passing them via the structured metadata parameter (second argument) to the logger, not raw string interpolation
Use the standardized risk severity levels: critical, high, medium, low when reporting results
Files:
src/redteam/constants/metadata.tssrc/redteam/constants/strategies.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)
Prefer not to introduce new TypeScript types; reuse existing interfaces where possible
**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks
**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts
Files:
src/redteam/constants/metadata.tssrc/redteam/constants/strategies.ts
src/**
📄 CodeRabbit inference engine (AGENTS.md)
Place core application/library logic under src/
Files:
src/redteam/constants/metadata.tssrc/redteam/constants/strategies.ts
CHANGELOG.md
📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)
CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change
CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)
CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...
Files:
CHANGELOG.md
🧠 Learnings (7)
📚 Learning: 2025-10-05T16:59:20.507Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/strategies/**/*.ts : Store attack transformation strategies under src/redteam/strategies/ (e.g., jailbreak.ts, prompt-injection.ts, base64.ts)
Applied to files:
src/redteam/constants/metadata.tssrc/redteam/constants/strategies.ts
📚 Learning: 2025-10-24T22:41:44.088Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-10-24T22:41:44.088Z
Learning: Applies to CHANGELOG.md : Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Applied to files:
CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Add new entries under the 'Unreleased' section
Applied to files:
CHANGELOG.md
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Applied to files:
CHANGELOG.md
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Applied to files:
CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Document all user-facing changes in CHANGELOG.md
Applied to files:
CHANGELOG.md
📚 Learning: 2025-10-24T22:41:44.088Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-10-24T22:41:44.088Z
Learning: Applies to CHANGELOG.md : Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)
Applied to files:
CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: Share Test
- GitHub Check: webui tests
- GitHub Check: Redteam (Production API)
- GitHub Check: Test on Node 24.x and ubuntu-latest
- GitHub Check: Build Docs
- GitHub Check: Test on Node 20.x and macOS-latest
- GitHub Check: Test on Node 24.x and windows-latest
- GitHub Check: Test on Node 22.x and macOS-latest
- GitHub Check: Test on Node 20.x and ubuntu-latest
- GitHub Check: Test on Node 22.x and windows-latest
- GitHub Check: Test on Node 22.x and ubuntu-latest
- GitHub Check: Redteam (Staging API)
- GitHub Check: Test on Node 20.x and windows-latest
- GitHub Check: Build on Node 22.x
- GitHub Check: Build on Node 24.x
- GitHub Check: Build on Node 20.x
- GitHub Check: Style Check
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
src/redteam/constants/strategies.ts (2)
72-72: LGTM!Adding 'jailbreak' to
ADDITIONAL_STRATEGIESis consistent with the change toDEFAULT_STRATEGIES. This maintains the availability of the single-shot jailbreak strategy while making the meta-agent the default.
12-12: Change verified: 'jailbreak:meta' promotion to default strategies is correctly implemented.The 'jailbreak:meta' strategy is properly listed in
STRATEGIES_REQUIRING_REMOTE, ensuring the UI will correctly disable it when remote generation is unavailable. Error handling is in place at the provider level to inform users when remote capabilities are needed. All downstream code (CLI init, generation commands, validators, and UI components) consistently handle this strategy as a default, with the UI marking it as "Recommended" as expected.src/redteam/constants/metadata.ts (1)
89-90: LGTM!The updated description is clearer and more concise. The simplification appropriately reflects the strategy's purpose while maintaining accuracy.
| - chore(app): larger eval selector dialog (#6063) | ||
| - refactor(app): Adds useApplyFilterFromMetric hook (#6095) | ||
| - refactor(cli): extract duplicated organization context display logic into shared utility function to fix dynamic import issue and improve code maintainability (#6070) | ||
| - chore: make meta-agent a default strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog entry: add scope, correct ID, and include PR number
Conform to our changelog rules: add scope, use the strategy ID in backticks, and include the PR number. Also mention moving jailbreak out of defaults for clarity.
Apply this diff:
- - chore: make meta-agent a default strategy
+ - chore(redteam): promote `jailbreak:meta` to default strategy; move `jailbreak` to additional strategies (#6109)As per coding guidelines.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - chore: make meta-agent a default strategy | |
| - chore(redteam): promote `jailbreak:meta` to default strategy; move `jailbreak` to additional strategies (#6109) |
🤖 Prompt for AI Agents
In CHANGELOG.md around line 25, the entry "chore: make meta-agent a default
strategy" needs to follow changelog rules: add a scope (e.g., "strategy"), use
the strategy ID in backticks (e.g., `meta-agent`), include the PR number (e.g.,
#1234), and mention that `jailbreak` was moved out of defaults for clarity;
update the line to a scoped, backticked entry including the PR reference and a
short note about removing `jailbreak` from defaults.
…ategy - Regenerate config-schema.json to reflect new default strategies - Update test expectations to use jailbreak:meta instead of jailbreak - Fix strategy order in tests to match alphabetically sorted output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update tests to use jailbreak:meta instead of jailbreak since jailbreak:meta is now a default strategy (shows Recommended badge) while jailbreak moved to additional strategies. Fixes failing webui tests after making meta-agent a default strategy.
No description provided.