feat(app): Red Team Strategy Test Generation #6005

will-holley · 2025-10-23T17:52:05Z

Features

Regeneration
Handles single turn strategies
For strategies requiring config, validates config has been provided
Renders previews for image, audio, and video strategies
Handles multi-turn strategies

- Removes values unnecessarily (no refs) returned by the provider - Adds support for generating w/ strategies

- Request body validation - Fixes types - Removes unreachable logic - Adds type-support for nonce

coderabbitai · 2025-10-23T18:55:14Z

📝 Walkthrough

Walkthrough

This pull request refactors the red team test case generation system to use structured plugin and strategy objects. It introduces a TestCaseGenerationProvider context wrapper around the main Strategies component, replaces loose string/config parameters with explicit {id, config} object shapes for plugins and strategies, converts type definitions in redteam/types.ts to zod-based schemas with validation, updates the /generate-test API endpoint to validate and process the new object formats, and integrates test case generation capabilities into individual strategy items with dedicated UI buttons. TestCaseDialog is updated to accept and display both plugin and strategy information. The backend applies strategy factories post-generation to transform test cases based on selected strategies.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

This change involves heterogeneous modifications across multiple architectural layers. The PR introduces new public API signatures (TestCaseDialogProps, TestCaseGenerationContextValue, StrategyCardData.id), converts type definitions to zod-based schemas requiring validation logic review, substantially modifies the /generate-test endpoint to handle structured payloads and strategy application, and refactors component integration patterns with the new provider wrapper. While individual changes follow consistent patterns (structured objects replacing loose configs), the breadth of affected files, density of type system changes, and logic modifications in the backend endpoint require careful integration verification across the UI, context, type, and server layers.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	The pull request description is empty and therefore does not provide any information about the changeset. While the check is lenient and does not require extensive detail, a completely blank description neither confirms relatedness to the changeset nor provides any meaningful context. The vagueness of a missing description makes it impossible to definitively assess whether the author intended to document their changes, resulting in insufficient information to conclusively determine if this meets the pass criteria.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "feat(app): Red Team Strategy Test Generation" directly and clearly describes the main objective of this changeset. The PR introduces comprehensive test case generation capabilities for red team strategies across multiple components, including a new TestCaseGenerationProvider, updates to the /generate-test backend endpoint, and integration with individual strategy items. The title uses a conventional commit format, is concise (6 words), and avoids vague terms or noise. A teammate reviewing the repository history would immediately understand that this PR adds a feature for generating test cases within the red team strategy workflow.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/strategy-test-gen

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx (1)
364-376: Missing dependency in useCallback

handleGenerateTestCase uses generateTestCase but doesn’t list it as a dependency. Add it to avoid stale closures.
   );
-    [toast],
+    [toast, generateTestCase],
   );
src/app/src/pages/redteam/setup/components/PluginsTab.tsx (1)
43-43: Use app ErrorBoundary component per guidelines

Swap react-error-boundary for @app/components/ErrorBoundary.
-import { ErrorBoundary } from 'react-error-boundary';
+import ErrorBoundary from '@app/components/ErrorBoundary';
-const ErrorFallback = ({ error }: { error: Error }) => (
-  <div role="alert">
-    <p>Something went wrong:</p>
-    <pre>{error.message}</pre>
-  </div>
-);
+// Removed: use app ErrorBoundary wrapper instead
-  return (
-    <ErrorBoundary FallbackComponent={ErrorFallback}>
+  return (
+    <ErrorBoundary>
       <Box sx={{ display: 'flex', gap: 3, alignItems: 'flex-start' }}>
As per coding guidelines.

Also applies to: 58-64, 396-399
src/server/routes/redteam.ts (1)
301-307: Sanitize logging in proxy route and avoid JSON.stringify of bodies.

Don't log raw request/response payloads. Use structured, minimal fields.
-logger.debug(
-  `Received ${task} task request: ${JSON.stringify({
-    method: req.method,
-    url: req.url,
-    body: req.body,
-  })}`,
-);
+logger.debug('Received task request', { task, method: req.method, url: req.url });
...
-const data = await response.json();
-logger.debug(`Received response from cloud function: ${JSON.stringify(data)}`);
-res.json(data);
+const data = await response.json();
+logger.debug('Received response from cloud function', { task, status: response.status });
+res.json({ success: true, data });
As per coding guidelines

Also applies to: 327-333
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (1)
92-108: Update response parsing to match ApiResponse standard in two locations.

The /redteam/generate-test endpoint (and /providers/test as noted) currently return non-standard response formats. Per the guideline, all server responses must use the ApiResponse shape: { success: boolean, data?: T, error?: string }. Client code must be updated to parse this format.
-const data = await response.json();
-
-if (data.error) {
-  throw new Error(data.error);
-}
-
-const testCase: GeneratedTestCase = {
-  prompt: data.prompt,
-  context: data.context,
-  metadata: data.metadata,
-};
+const api = await response.json();
+if (!api.success) {
+  throw new Error(api.error || 'Failed to generate test case');
+}
+const { prompt, context, metadata } = api.data || {};
+const testCase: GeneratedTestCase = { prompt, context, metadata };
Apply the same change to lines 117-123 (/providers/test call).

🧹 Nitpick comments (13)

test/index.test.ts (1)
71-75: Add global mock cleanup after each test

Ensure isolation across suites. Add a top‑level afterEach to reset/restore all mocks.
 jest.mock('../src/util/file');
 
+// Global mock cleanup for this suite
+afterEach(() => {
+  jest.resetAllMocks();
+  jest.restoreAllMocks();
+});
+
 describe('index.ts exports', () => {
As per coding guidelines.
src/app/src/pages/redteam/setup/components/strategies/types.ts (2)
8-11: Align selectedStrategy type with Strategy

Use Strategy | null for consistency with StrategyCardData.id and downstream usage.
 export interface ConfigDialogState {
   isOpen: boolean;
-  selectedStrategy: string | null;
+  selectedStrategy: Strategy | null;
 }
21-31: Tighten preset strategy typings

Prefer Strategy over string for preset definitions to catch typos at compile time.
 export interface StrategyPreset {
   name: string;
   description: string;
-  strategies: readonly string[];
+  strategies: readonly Strategy[];
   options?: {
     multiTurn?: {
       label: string;
-      strategies: readonly string[];
+      strategies: readonly Strategy[];
     };
   };
 }
src/app/src/pages/redteam/setup/components/PluginsTab.tsx (2)
367-388: Add generateTestCase to deps

handleGenerateTestCase references generateTestCase; include it in dependencies.
-  }, [pluginConfig]);
+  }, [pluginConfig, generateTestCase]);
656-661: Fix tooltip typo

“reqiures” → “requires”.
-                          ? 'This plugin reqiures remote generation'
+                          ? 'This plugin requires remote generation'
src/app/src/pages/redteam/setup/components/Strategies.tsx (1)
451-457: Use anchor for external link

Prefer a plain anchor (or MUI Link with component="a") for external URLs to avoid router overhead.
-            <RouterLink
-              style={{ textDecoration: 'underline' }}
-              to="https://www.promptfoo.dev/docs/red-team/strategies/"
-              target="_blank"
-            >
+            <a
+              href="https://www.promptfoo.dev/docs/red-team/strategies/"
+              target="_blank"
+              rel="noreferrer"
+              style={{ textDecoration: 'underline' }}
+            >
               Learn More
-            </RouterLink>
+            </a>
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx (2)
25-27: Add display-name fallback for tooltip

pluginDisplayNames may be undefined; fall back to plugin id to avoid “undefined”.
-const TEST_GENERATION_PLUGIN_DISPLAY_NAME = pluginDisplayNames[TEST_GENERATION_PLUGIN as Plugin];
+const TEST_GENERATION_PLUGIN_DISPLAY_NAME =
+  pluginDisplayNames[TEST_GENERATION_PLUGIN as Plugin] ?? TEST_GENERATION_PLUGIN;
15-17: Disable generate when Cloud not connected; limit disable to current item

Match PluginsTab/CustomPolicies behavior: disable when apiHealthStatus !== 'connected', and only disable the current item during generation.
-import { type StrategyConfig, type RedteamStrategyObject } from '@promptfoo/redteam/types';
+import { type StrategyConfig, type RedteamStrategyObject } from '@promptfoo/redteam/types';
+import { useApiHealth } from '@app/hooks/useApiHealth';
 export function StrategyItem({
@@
 }: StrategyItemProps) {
   const { config } = useRedTeamConfig();
+  const {
+    data: { status: apiHealthStatus },
+  } = useApiHealth();
         <TestCaseGenerateButton
           onClick={handleTestCaseGeneration}
-          disabled={isDisabled || generatingTestCase}
-          isGenerating={generatingTestCase && currentStrategy === strategy.id}
+          disabled={
+            isDisabled ||
+            apiHealthStatus !== 'connected' ||
+            (generatingTestCase && currentStrategy === strategy.id)
+          }
+          isGenerating={generatingTestCase && currentStrategy === strategy.id}
Also applies to: 22-24, 192-201
src/server/routes/redteam.ts (1)
107-110: Make injectVar configurable; avoid hardcoding 'query'.

Let plugin.config.indirectInjectionVar override injectVar and use the same for prompt extraction.
-// TODO: Add support for this? Was previously misconfigured such that the no value would ever
-// be passed in as a configuration option.
-const injectVar = 'query';
+// Prefer plugin-specified variable when provided (e.g., indirect injection targets)
+const injectVar = plugin.config.indirectInjectionVar || 'query';
...
-const generatedPrompt = testCase.vars?.[injectVar] || 'Unable to extract test prompt';
+const generatedPrompt = testCase?.vars?.[injectVar] ?? 'Unable to extract test prompt';
Also applies to: 160-162
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx (2)
68-73: Simplify plugin/strategy name derivation.

Types are already Plugin | null and Strategy | null; the typeof checks are redundant.
-const pluginName = typeof plugin === 'string' ? plugin : plugin || '';
+const pluginName = plugin ?? '';
...
-const strategyName = typeof strategy === 'string' ? strategy : strategy || '';
+const strategyName = strategy ?? '';
Also applies to: 74-76

83-86: Optional: handle absent strategy gracefully in title.

If strategy is null, omit the “/ …” suffix.
-? `Generated Test Case for ${pluginDisplayName} / ${strategyDisplayName}`
+? `Generated Test Case for ${pluginDisplayName}${strategy ? ` / ${strategyDisplayName}` : ''}`
src/redteam/types.ts (1)
50-56: Fix swapped BFLA/BOLA comments to avoid confusion.

Comments appear reversed relative to usage in UI/server.
-// BOLA
-targetIdentifiers: z.array(z.string()).optional(),
-// BFLA
-targetSystems: z.array(z.string()).optional(),
+// BFLA (function names/endpoints)
+targetIdentifiers: z.array(z.string()).optional(),
+// BOLA (object/system identifiers)
+targetSystems: z.array(z.string()).optional(),
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (1)
29-36: Reuse existing public types instead of introducing new ones.

Prefer RedteamPluginObject/RedteamStrategyObject to avoid parallel “Target*” types.
-import type { PluginConfig, StrategyConfig } from '@promptfoo/redteam/types';
+import type {
+  RedteamPluginObject as TargetPlugin,
+  RedteamStrategyObject as TargetStrategy,
+} from '@promptfoo/redteam/types';
Adjust the generateTestCase signature accordingly (no behavior change).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb8d7af and 06290a1.

📒 Files selected for processing (10)

src/app/src/pages/redteam/setup/components/PluginsTab.tsx (1 hunks)
src/app/src/pages/redteam/setup/components/Strategies.tsx (2 hunks)
src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx (1 hunks)
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx (7 hunks)
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (7 hunks)
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx (5 hunks)
src/app/src/pages/redteam/setup/components/strategies/types.ts (1 hunks)
src/redteam/types.ts (3 hunks)
src/server/routes/redteam.ts (2 hunks)
test/index.test.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (13)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; use existing interfaces whenever possible

**/*.{ts,tsx}: Use TypeScript with strict type checking
Follow consistent import order (Biome will handle import sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks
Always sanitize sensitive data before logging
Use structured logger methods (debug/info/warn/error) with a context object instead of interpolating secrets into log strings
Use sanitizeObject for manual sanitization in non-logging contexts before persisting or further processing data

Files:

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx
test/index.test.ts
src/server/routes/redteam.ts
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx
src/app/src/pages/redteam/setup/components/strategies/types.ts
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx
src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/app/src/pages/redteam/setup/components/Strategies.tsx
src/redteam/types.ts

src/app/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/**/*.{ts,tsx}: Never use fetch() directly; always use callApi() from @app/utils/api for all HTTP requests
Access Zustand state outside React components via store.getState(); do not call hooks outside components
Use the @app/* path alias for internal imports as configured in Vite

Files:

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx
src/app/src/pages/redteam/setup/components/strategies/types.ts
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx
src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/app/src/pages/redteam/setup/components/Strategies.tsx

src/app/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/{components,pages}/**/*.tsx: Use the class-based ErrorBoundary component (@app/components/ErrorBoundary) to wrap error-prone UI
Access theme via useTheme() from @mui/material/styles instead of hardcoding theme values
Use useMemo/useCallback only when profiling indicates benefit; avoid unnecessary memoization
Implement explicit loading and error states for components performing async operations
Prefer MUI composition and the sx prop for styling over ad-hoc inline styles

Files:

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx
src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/app/src/pages/redteam/setup/components/Strategies.tsx

**/*.{tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/react-components.mdc)

**/*.{tsx,jsx}: Use icons from @mui/icons-material
Prefer commonly used icons from @mui/icons-material for intuitive experience

Files:

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx
src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/app/src/pages/redteam/setup/components/Strategies.tsx

src/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

src/app/**/*.{ts,tsx}: In the React app (src/app), always use callApi from @app/utils/api for API calls instead of fetch()
React hooks: use useMemo for computed values and useCallback for functions that accept arguments

Files:

src/app/src/pages/redteam/setup/components/Targets/CustomPoliciesSection.tsx
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx
src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx
src/app/src/pages/redteam/setup/components/strategies/types.ts
src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx
src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/app/src/pages/redteam/setup/components/Strategies.tsx

**/*.{test,spec}.{js,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Avoid disabling or skipping tests unless absolutely necessary and documented

Files:

test/index.test.ts

test/**/*.{test,spec}.ts

📄 CodeRabbit inference engine (.cursor/rules/jest.mdc)

test/**/*.{test,spec}.ts: Mock as few functions as possible to keep tests realistic
Never increase the function timeout - fix the test instead
Organize tests in descriptive describe and it blocks
Prefer assertions on entire objects rather than individual keys when writing expectations
Clean up after tests to prevent side effects (e.g., use afterEach(() => { jest.resetAllMocks(); }))
Run tests with --randomize flag to ensure your mocks setup and teardown don't affect other tests
Use Jest's mocking utilities rather than complex custom mocks
Prefer shallow mocking over deep mocking
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
For database tests, use in-memory instances or proper test fixtures
Test both success and error cases for each provider
Mock API responses to avoid external dependencies in tests
Validate that provider options are properly passed to the underlying service
Test error handling and edge cases (rate limits, timeouts, etc.)
Ensure provider caching behaves as expected
Always include both --coverage and --randomize flags when running tests
Run tests in a single pass (no watch mode for CI)
Ensure all tests are independent and can run in any order
Clean up any test data or mocks after each test

Files:

test/index.test.ts

test/**/*.test.ts

📄 CodeRabbit inference engine (test/CLAUDE.md)

test/**/*.test.ts: Never increase Jest test timeouts; fix slow tests instead (avoid jest.setTimeout or large timeouts in tests)
Do not use .only() or .skip() in committed tests
Add afterEach(() => { jest.resetAllMocks(); }) to ensure mock cleanup
Prefer asserting entire objects (toEqual on whole result) rather than individual fields
Mock minimally: only external dependencies (APIs, databases), not code under test
Use Jest (not Vitest) APIs in this suite; avoid importing vitest
Import from @jest/globals in tests

Files:

test/index.test.ts

test/**

📄 CodeRabbit inference engine (test/CLAUDE.md)

Organize tests to mirror src/ structure (e.g., test/providers → src/providers, test/redteam → src/redteam)

Files:

test/index.test.ts

test/**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

test/**/*.{test,spec}.{ts,tsx}: Follow Jest best practices with describe/it blocks
Write tests that cover both success and error cases for all functionality

Files:

test/index.test.ts

src/server/**/*.ts

📄 CodeRabbit inference engine (src/server/CLAUDE.md)

src/server/**/*.ts: Sanitize all logged request/response data. Do not stringify or interpolate req/res directly; pass structured objects to the logger so sensitive fields are auto-redacted.
Use Drizzle ORM (with schema from src/database/schema.ts and helpers like eq) for database access instead of raw SQL.
Implement the server using Express 5 APIs for HTTP handling.

Files:

src/server/routes/redteam.ts

src/server/routes/**/*.ts

📄 CodeRabbit inference engine (src/server/CLAUDE.md)

src/server/routes/**/*.ts: Always validate request bodies with Zod schemas before processing route handlers.
Wrap all HTTP responses in the standard ApiResponse shape: { success, data? } on success and { success: false, error } on failure.
Use try/catch in route handlers; log errors and return 400 for validation errors and 500 for unexpected errors.
Use appropriate HTTP status codes (200, 201, 400, 404, 500) for API responses.
Organize API endpoint handlers in src/server/routes (e.g., routes/eval.ts, routes/config.ts, routes/results.ts, routes/share.ts).

Files:

src/server/routes/redteam.ts

src/redteam/**/*.ts

📄 CodeRabbit inference engine (src/redteam/CLAUDE.md)

src/redteam/**/*.ts: Always sanitize when logging test prompts or model outputs by passing them via the structured metadata parameter (second argument) to the logger, not raw string interpolation
Use the standardized risk severity levels: critical, high, medium, low when reporting results

Files:

src/redteam/types.ts

🧠 Learnings (4)

📚 Learning: 2025-10-05T16:59:20.507Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new plugins under test/redteam/

Applied to files:

src/server/routes/redteam.ts
src/app/src/pages/redteam/setup/components/PluginsTab.tsx

📚 Learning: 2025-10-05T16:59:20.507Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/plugins/**/*.ts : Place vulnerability-specific test generators as plugins under src/redteam/plugins/ (e.g., pii.ts, harmful.ts, sql-injection.ts)

Applied to files:

src/server/routes/redteam.ts
src/app/src/pages/redteam/setup/components/PluginsTab.tsx

📚 Learning: 2025-10-05T16:59:20.507Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/plugins/**/*.ts : New plugins must implement the RedteamPluginObject interface

Applied to files:

src/app/src/pages/redteam/setup/components/PluginsTab.tsx
src/redteam/types.ts

📚 Learning: 2025-10-05T16:59:20.507Z

Learnt from: CR
PR: promptfoo/promptfoo#0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/strategies/**/*.ts : Store attack transformation strategies under src/redteam/strategies/ (e.g., jailbreak.ts, prompt-injection.ts, base64.ts)

Applied to files:

src/app/src/pages/redteam/setup/components/Strategies.tsx

🧬 Code graph analysis (5)

src/server/routes/redteam.ts (5)

src/redteam/constants/plugins.ts (2)

ALL_PLUGINS (355-362)

REDTEAM_MODEL (9-9)

src/redteam/types.ts (2)

PluginConfigSchema (33-72)

StrategyConfigSchema (76-84)

src/redteam/constants/strategies.ts (1)

ALL_STRATEGIES (93-93)

src/redteam/strategies/index.ts (2)

Strategy (38-38)

Strategies (40-320)

src/redteam/plugins/index.ts (2)

Plugins (345-350)

PluginFactory (61-65)

src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx (5)

site/docs/_shared/data/plugins.ts (1)

Plugin (70-80)

src/app/src/pages/redteam/setup/hooks/useRedTeamConfig.ts (1)

useRedTeamConfig (260-341)

src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (1)

useTestCaseGeneration (218-224)

src/redteam/types.ts (2)

RedteamStrategyObject (114-117)

StrategyConfig (86-86)

src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx (1)

TestCaseGenerateButton (424-449)

src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (3)

src/redteam/types.ts (2)

PluginConfig (74-74)

StrategyConfig (86-86)

src/app/src/pages/redteam/setup/types.ts (1)

Config (27-41)

src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx (1)

TestCaseDialog (49-422)

src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx (1)

src/redteam/constants/metadata.ts (2)

displayNameOverrides (160-295)

categoryAliases (574-679)

src/app/src/pages/redteam/setup/components/Strategies.tsx (1)

src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (1)

TestCaseGenerationProvider (60-216)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Test on Node 24.x and ubuntu-latest
GitHub Check: Test on Node 22.x and macOS-latest
GitHub Check: Test on Node 24.x and windows-latest
GitHub Check: Test on Node 22.x and windows-latest
GitHub Check: Test on Node 20.x and windows-latest
GitHub Check: webui tests
GitHub Check: Tusk Tester

🔇 Additional comments (5)

test/index.test.ts (1)

101-116: LGTM: new schema exports covered

Adding PluginConfigSchema and StrategyConfigSchema to expectedSchemaExports looks correct and aligned with the new public API.

src/app/src/pages/redteam/setup/components/strategies/types.ts (1)

1-6: Type-narrowing for id is good

Using Strategy for StrategyCardData.id improves safety and autocomplete. No issues.

src/app/src/pages/redteam/setup/components/PluginsTab.tsx (1)

377-384: LGTM: structured generation call

Passing { id, config } for plugin and strategy matches the new provider API.

src/app/src/pages/redteam/setup/components/Strategies.tsx (1)

36-37: LGTM: Provider integration

Wrapping the Strategies UI with TestCaseGenerationProvider is correct and keeps generation dialog in scope.

Also applies to: 531-647

src/app/src/pages/redteam/setup/components/TestCaseGenerationProvider.tsx (1)

77-90: Include strategy in telemetry and keep IDs only (good).

This change looks correct and helpful for analytics.

src/server/routes/redteam.ts

…d evaluation

…ired

use-tusk · 2025-10-31T17:12:09Z

❌ Generated 52 tests - 42 passed, 10 failed (`4160685`) View tests ↗

Test Summary

PluginsTab - 10 ✅
Strategies - 2 ✅, 3 ❌
TestCaseDialog - 11 ✅, 2 ❌
TestCaseGenerateButton - 4 ✅, 1 ❌
TestCaseGenerationProvider - 11 ✅, 3 ❌
StrategyItem - 4 ✅, 1 ❌

Results

Tusk's tests show solid coverage of the core red team strategy test generation features, with 42 passing tests validating critical workflows like plugin selection, strategy configuration, test case generation, and error handling. However, 10 failing tests reveal gaps in edge case handling and error scenarios that need attention before this PR is production-ready. The failures cluster around three areas: incomplete null/undefined checks in the Strategies and TestCaseGenerationProvider components, missing error display logic in TestCaseDialog, and a disabled state issue in StrategyItem. These aren't blockers for the main feature but represent robustness issues that could surface in real usage.

Key Points

❌ TestCaseGenerationProvider missing null guards: The generation effect doesn't validate that plugin and strategy are defined before making API calls, risking runtime errors or invalid requests. Add stricter precondition checks before triggering generation.
❌ TestCaseDialog error handling incomplete: When targetResponse.error exists, the component still displays output instead of the error message. Users won't see failure reasons when tests fail. Prioritize error display over output in the Target Response section.
❌ Strategies component crashes on null config.target: The handleStatefulChange function spreads config.target without null checks, causing TypeErrors during initialization or state transitions. Add optional chaining to safely handle missing target configurations.
❌ Strategies doesn't validate strategy IDs: Invalid or null strategy IDs in config aren't filtered before determining checked state, potentially breaking the UI. Filter out invalid IDs before rendering checkboxes.
❌ Strategies missing stateful mode validation: No warning when stateful mode is enabled with zero strategies selected, allowing misconfiguration. Add validation to warn users about this invalid state.
❌ TestCaseGenerateButton unsafe API health access: Destructuring useApiHealth data without null checks causes crashes when the hook returns undefined. Use optional chaining to safely access data?.status.
❌ StrategyItem test generation button not disabled for remote-disabled strategies: The button doesn't respect isRemoteGenerationDisabled, allowing users to attempt generation when remote is unavailable. Disable the button and show appropriate tooltip when remote generation is disabled.
❌ TestCaseGenerationProvider missing error toast on target execution failure: When test execution fails due to invalid target config, no error toast is shown to the user. Add error notification in the catch block for test execution.

Symbols Not Exported

The following symbols were skipped because they are not exported and therefore cannot be tested. Export these symbols if you want them to be tested:

src/app/src/pages/redteam/setup/components/TestCaseDialog.tsx - (Section)
src/app/src/pages/redteam/setup/components/strategies/StrategyItem.tsx - (isRequiredConfigMissing)

Existing test issues

While performing the check, Tusk fixed the following test files:

src/app/src/pages/redteam/setup/components/strategies/StrategyItem.test.tsx: The test file was missing a QueryClientProvider which caused react-query hooks to fail; the test setup was updated to include the provider, resolving the error.

Tests generated in these files will include these fixes. If you wish to only commit these fixes, you can do so in the Tusk app.

Please contact Support if you have questions.

View check history

Commit	Status	Output	Created (UTC)
`f749e75`	⏩ No tests generated	Output	Oct 23, 2025 5:52PM
`06290a1`	⏩ Skipped due to new commit on branch	Output	Oct 23, 2025 6:46PM
`1d20935`	⏩ Skipped due to new commit on branch	Output	Oct 23, 2025 6:54PM
`456e20c`	⏩ Skipped due to new commit on branch	Output	Oct 23, 2025 6:56PM
`fc981ad`	⏩ Skipped due to new commit on branch	Output	Oct 23, 2025 7:20PM
`3d698e7`	❌ Generated 23 tests - 22 passed, 1 failed	Tests	Oct 23, 2025 7:23PM
`33577c3`	⏩ Skipped due to new commit on branch	Output	Oct 24, 2025 2:22PM
`3e72e4d`	✅ Generated 20 tests - 20 passed	Tests	Oct 24, 2025 2:28PM
`c9a91a1`	✅ Generated 22 tests - 22 passed	Tests	Oct 24, 2025 5:00PM
`2d8347a`	✅ Generated 29 tests - 29 passed	Tests	Oct 27, 2025 5:53PM
`ef0c43b`	❌ Generated 37 tests - 36 passed, 1 failed	Tests	Oct 30, 2025 5:13PM
`b653061`	⏩ Skipped due to new commit on branch	Output	Oct 30, 2025 7:00PM
`e8ca6a5`	⏩ Skipped due to new commit on branch	Output	Oct 30, 2025 7:03PM
`6ae7f48`	🔄 Running Tusk Tester	Output	Oct 30, 2025 7:05PM
`53e7ec5`	⏩ Skipped due to new commit on branch	Output	Oct 30, 2025 8:02PM
`6ec161c`	🔄 Running Tusk Tester	Output	Oct 30, 2025 8:09PM
`4160685`	❌ Generated 52 tests - 42 passed, 10 failed	Tests	Oct 31, 2025 4:11PM

View output in GitHub ↗

Was Tusk helpful? Give feedback by reacting with 👍 or 👎

will-holley added 7 commits October 23, 2025 12:31

refactor: TestCaseGenerationProvider

23b633d

- Removes values unnecessarily (no refs) returned by the provider - Adds support for generating w/ strategies

refactor: plugin test gen w/ strategies

0b3bcc0

feat: Adds triggering from strategy items

47040f2

refactor: Derive PluginConfig from Zod PluginConfigSchema

0480db5

chore: format

97878ae

refactor: Clean up /generate-test endpoint

6edca38

- Request body validation - Fixes types - Removes unreachable logic - Adds type-support for nonce

refactor: UI improvements

f749e75

will-holley mentioned this pull request Oct 23, 2025

feat: add magic wand test generation to red team strategies #5868

Closed

8 tasks

will-holley added 5 commits October 23, 2025 14:06

refactor: Adds StrategyConfigSchema Zod Schema

2b021ad

feat: Pass strategy config to API call

d539291

chore: Format

28eebbd

chore: format

1763e4e

feat: Apply strategy during generation

06290a1

will-holley marked this pull request as ready for review October 23, 2025 18:47

fix: Intent validator

1d20935

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

src/server/routes/redteam.ts Show resolved Hide resolved

src/server/routes/redteam.ts Show resolved Hide resolved

src/server/routes/redteam.ts Show resolved Hide resolved

will-holley added 3 commits October 23, 2025 14:56

chore: Document nonce

456e20c

test: Updates unit tests

fc981ad

chore: Handle "default" strategy

3d698e7

will-holley self-assigned this Oct 23, 2025

will-holley and others added 9 commits October 24, 2025 10:22

Merge branch 'main' into feat/strategy-test-gen

33577c3

test: Fixes mock

3e72e4d

Merge branch 'main' into feat/strategy-test-gen

c9a91a1

Merge branch 'main' into feat/strategy-test-gen

2d8347a

Merge branch 'main' into feat/strategy-test-gen

ef0c43b

feat: Disable button if Cloud unavailable

7ed34f7

chore: Updates strategy item tooltip

1fe290a

chore: Removes unused code

b60f54a

refactor: Removes telemetry event configuration

d68fc57

will-holley added 3 commits October 30, 2025 14:47

refactor: Show independent loading states for test case generation an…

9a516ef

…d evaluation

feat: Regenerate button

39dd888

chore: Dialog styling

b653061

will-holley requested a review from mldangelo October 30, 2025 19:01

will-holley and others added 8 commits October 30, 2025 15:03

fix: tooltip intersection

e8ca6a5

chore: lint

6ae7f48

chore: format

6caf603

fix: Handle audio prompts

1a91818

fix: Render images

209470e

fix: Render video prompts

53e7ec5

fix: Do not allow test generation when strategy configuration is requ…

6ec161c

…ired

Merge branch 'main' into feat/strategy-test-gen

4160685

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(app): Red Team Strategy Test Generation #6005

feat(app): Red Team Strategy Test Generation #6005

will-holley commented Oct 23, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

use-tusk bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(app): Red Team Strategy Test Generation #6005

Are you sure you want to change the base?

feat(app): Red Team Strategy Test Generation #6005

Conversation

will-holley commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

use-tusk bot commented Oct 31, 2025

❌ Generated 52 tests - 42 passed, 10 failed (4160685) View tests ↗

Test Summary

Results

Key Points

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

will-holley commented Oct 23, 2025 •

edited

Loading

❌ Generated 52 tests - 42 passed, 10 failed (`4160685`) View tests ↗