Skip to content

Conversation

@iitslamaa
Copy link
Contributor

@iitslamaa iitslamaa commented Oct 17, 2025

feat(errors): add structured error taxonomy & provider guardrails

Summary

Implements an error taxonomy and provider-agnostic guardrail utilities.
Provides consistent error handling and retry logic across all providers.

Details

  • Adds ErrorCategory and ClassifiedError types.
  • Introduces withRetries helper and configurable GuardrailConfig.
  • Provides consistent retry/backoff logic across providers.
  • No provider-specific wiring or test/tooling changes.
  • Supersedes: feat: add error taxonomy and provider guardrails feat: add error taxonomy and provider guardrails #5949 (with cleanup).

Key changes

  • Standardized error categories:
    • provider_error — e.g., invalid API key
    • validation_error — bad YAML/config shape
    • timeout — request/provider timeout
    • tool_error — local runtime/tooling issue
  • Unified AppError helpers; errors now carry { type, code, message, provider } used by CLI/JSON outputs.

Tests

Verified locally on Node 20:

These tests are self-contained under scratch/ (ignored by Git) and run in seconds.

# Be in repo root
cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)"

# Keep artifacts out of git
mkdir -p scratch
echo "scratch/" >> .git/info/exclude

# A) BASELINE — echo provider (should PASS)
cat > scratch/baseline.yaml <<'YAML'
prompts:
  - "Hello, {{name}}"
providers:
  - id: echo
tests:
  - vars: { name: "Lama" }
    assert:
      - type: contains
        value: "Lama"
YAML

# B) BASELINE (HTTP) — static data: URL (should PASS)
cat > scratch/baseline_http.yaml <<'YAML'
prompts:
  - "Hello, {{name}}"
providers:
  - id: http
    config:
      url: "data:text/plain,Hello%2C%20World"
      method: POST
      body: "Hello, World"
tests:
  - vars: { name: "Lama" }
    assert:
      - type: contains
        value: "Hello"
YAML

# C) PROVIDER ERROR — invalid API key (exercises provider_error)
cat > scratch/fail-invalid-key.yaml <<'YAML'
prompts:
  - "Ping"
providers:
  - id: openai:gpt-4o
    config:
      apiKey: "sk-invalid"
tests:
  - assert:
      - type: equals
        value: "x"
YAML

# D) VALIDATION ERROR — wrong type in provider config (exercises validation_error)
cat > scratch/fail-validation-a.yaml <<'YAML'
prompts:
  - "Ping"
providers:
  - id: http
    config:
      url: 12345
YAML

echo "=== baseline (echo) — should PASS ==="
npx promptfoo eval -c scratch/baseline.yaml

echo "=== baseline (http) — should PASS ==="
npx promptfoo eval -c scratch/baseline_http.yaml

echo "=== invalid key — expect provider_error (allowed failure) ==="
npx promptfoo eval -c scratch/fail-invalid-key.yaml || true

echo "=== bad config — expect validation_error (allowed failure) ==="
npx promptfoo eval -c scratch/fail-validation-a.yaml || true

Cleanup & Follow-ups

  • Reverted unintended package.json / package-lock.json churn.
  • Renamed src/lib/guardrails → src/lib/guardrails.ts.
  • Added changelog entry under Unreleased.
  • Applied Biome formatting to satisfy CI style check.
  • Ready for re-review once CI completes.

Notes

  • Type/lint/format checks pass.
  • CI verified on Node 20 (macOS may require npm rebuild better-sqlite3 --force).
  • Follow-up idea: add a --strict-validate flag to fail on coercions or 0-test configs.

@iitslamaa
Copy link
Contributor Author

Hey @mldangelo , this is ready for review!
I cleaned up the previous branch, finalized the error taxonomy and guardrail helpers, and verified all categories locally (provider_error, validation_error, timeout, tool_error).
Let me know if you’d like me to add stricter validation or any additional tests before merge.

@iitslamaa iitslamaa marked this pull request as ready for review October 17, 2025 22:24
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive error handling and retry infrastructure. It defines structured error types (ErrorCategory, ErrorType, EvalErrorInfo) and result types (GradingResult, CaseResult, AssertionDetail) for standardized evaluation output. A new AppError class provides structured error information with helper constructors for common error categories. A guardrails module implements a retry mechanism with jittered exponential backoff, timeout control via AbortController, and special handling for rate limits. Minor dependency updates include removing peerDependencies and updating prettier version spec.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

The review requires understanding retry logic with exponential backoff and timeout handling in the guardrails module, verifying type consistency across multiple new files, and validating error propagation patterns through helper constructors. While the changes are heterogeneous—spanning types, errors, and guardrails—each component is self-contained with no existing code modifications, keeping the effort in the moderate range.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat(guardrails): add error taxonomy and provider-agnostic guardrails" directly reflects the main changes in the pull request. The changeset introduces error taxonomy types (ErrorCategory, ErrorType, EvalErrorInfo) and the AppError class in src/lib/errors.ts, combined with a new provider-agnostic guardrails utility featuring the withRetries function in src/lib/guardrails. The title is specific, concise, and clearly captures both primary features without being vague or misleading. Minor package.json updates are appropriately de-emphasized, as the title correctly focuses on the architectural additions.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed The pull request description is clearly related to the changeset and accurately reflects the changes made. The author describes implementing an error taxonomy with standardized error categories (provider_error, validation_error, timeout, tool_error) and provider-agnostic guardrail utilities, which directly aligns with the additions of error types in src/types/, the new AppError class and helper constructors in src/lib/errors.ts, and the withRetries function with GuardrailConfig in src/lib/guardrails. The description includes specific technical details about key exports, retry/backoff logic, error structure, and cleanup steps, demonstrating clear understanding of the changeset rather than being vague or generic. While the description also mentions package.json changes, these align with the minor modifications shown in the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
src/types/errors.ts (1)

1-8: Consider typing code with ErrorCategory for stronger guarantees.

If ErrorCategory represents standardized error codes, wire it into EvalErrorInfo.code (in src/types/result.ts) to prevent free-form strings. Keep this file as a thin re-export hub to avoid duplication.

src/types/result.ts (1)

5-11: Type EvalErrorInfo.code with ErrorCategory (instead of string).

Improves safety and keeps taxonomy consistent with src/types/errors.ts.

Apply:

+import type { ErrorCategory } from './errors';
 export interface EvalErrorInfo {
   type: ErrorType;
-  code?: string;
+  code?: ErrorCategory;
   message: string;
   hint?: string;
   provider?: string;
src/lib/errors.ts (1)

22-32: Include stack/cause in toInfo for parity with EvalErrorInfo.

Restores fidelity for CLI/JSON outputs and debugging.

Apply:

   toInfo(): EvalErrorInfo {
     return {
       type: this.type,
       code: this.code,
       message: this.message,
       hint: this.hint,
       provider: this.provider,
       requestId: this.requestId,
       raw: this.raw,
+      stack: this.stack,
+      // Avoid leaking objects; only surface string causes
+      cause: typeof this.cause === 'string' ? this.cause : undefined,
     };
   }
src/lib/guardrails (4)

42-49: Use unknown in catch and clear the timer immediately.

Avoid any; clear the timer before heavy work to prevent leaks.

Apply:

-    try {
+    try {
       const out = await call(args, controller.signal);
       clearTimeout(timeout);
       return out;
-    } catch (err: any) {
+    } catch (err: unknown) {
+      clearTimeout(timeout);
       attempt++;

51-61: Harden error extraction and retryability checks.

Broaden network error codes and parse status from common SDK shapes.

Apply:

-      const code = err?.code ?? err?.status ?? err?.name;
-      const status = err?.status as number | undefined;
-      const retryAfterHeader = err?.retryAfterSeconds;
-      const retryAfter = Number(retryAfterHeader);
+      const anyErr = err as Record<string, any> | undefined;
+      const code = anyErr?.code ?? anyErr?.name ?? anyErr?.status ?? anyErr?.cause?.code;
+      const status: number | undefined =
+        anyErr?.status ??
+        anyErr?.statusCode ??
+        anyErr?.response?.status ??
+        anyErr?.response?.statusCode;
+      // Retry-After: seconds or HTTP-date; look in common locations
+      const raHeader =
+        anyErr?.retryAfterSeconds ??
+        anyErr?.response?.headers?.['retry-after'] ??
+        anyErr?.headers?.['retry-after'];
+      const retryAfter =
+        typeof raHeader === 'string' && /^\d+$/.test(raHeader)
+          ? Number(raHeader)
+          : Number(raHeader); // best effort fallback

78-85: Respect Retry-After more robustly and avoid over-sleeping.

Use parsed retryAfter when present; otherwise exponential backoff with jitter.

Apply:

-      let delay = Math.min(max, base * 2 ** (attempt - 1));
-      if (respectRetryAfter && Number.isFinite(retryAfter) && retryAfter! > 0) {
-        delay = Math.max(delay, retryAfter! * 1000);
-      }
-      await sleep(jitter(delay));
-      clearTimeout(timeout);
+      let delay = Math.min(max, base * 2 ** (attempt - 1));
+      if (respectRetryAfter && Number.isFinite(retryAfter) && retryAfter > 0) {
+        delay = Math.max(delay, retryAfter * 1000);
+      }
+      await sleep(jitter(delay));

65-73: Sanitize raw error details before attaching.

Error messages can leak tokens/PII; consider redaction before adding to raw.

Example:

const msg = String((anyErr as any)?.message ?? '');
const safeMsg = msg.replace(/(sk-[A-Za-z0-9]{20,})/g, '[REDACTED]');
raw: { status, code, message: safeMsg }

As per coding guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12ae51d and 7fca23e.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • package.json (1 hunks)
  • src/lib/errors.ts (1 hunks)
  • src/lib/guardrails (1 hunks)
  • src/types/errors.ts (1 hunks)
  • src/types/result.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; use existing interfaces whenever possible

**/*.{ts,tsx}: Use TypeScript with strict type checking
Follow consistent import order (Biome will handle import sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks
Always sanitize sensitive data before logging
Use structured logger methods (debug/info/warn/error) with a context object instead of interpolating secrets into log strings
Use sanitizeObject for manual sanitization in non-logging contexts before persisting or further processing data

Files:

  • src/types/errors.ts
  • src/lib/errors.ts
  • src/types/result.ts
{package.json,src/app/package.json,site/package.json,examples/**/package.json}

📄 CodeRabbit inference engine (CLAUDE.md)

PeerDependencies must match devDependencies versions

Files:

  • package.json
{package.json,src/app/package.json,site/package.json}

📄 CodeRabbit inference engine (CLAUDE.md)

Use CommonJS modules (set "type": "commonjs" in package.json)

Files:

  • package.json
🧬 Code graph analysis (3)
src/types/errors.ts (1)
src/types/result.ts (1)
  • EvalErrorInfo (5-15)
src/lib/errors.ts (2)
src/types/result.ts (2)
  • ErrorType (2-2)
  • EvalErrorInfo (5-15)
src/types/errors.ts (1)
  • EvalErrorInfo (10-15)
src/types/result.ts (1)
src/types/errors.ts (1)
  • EvalErrorInfo (10-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Run Integration Tests
  • GitHub Check: webui tests
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Generate Assets
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Build Docs
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Style Check
🔇 Additional comments (2)
src/types/result.ts (1)

25-51: Result/grading shapes look coherent and extensible.

No issues spotted. The structure aligns with Promptfoo’s output schema.

package.json (1)

153-153: Prettier bump verified; fix CommonJS type in site and src/app packages.

Workspace alignment is sound—Prettier only declared at root (no sub-package conflicts). The peerDependencies concern is moot; none exist across the workspace.

However, the coding guidelines require "type": "commonjs" in site/package.json and src/app/package.json. Currently:

  • site/package.json: missing "type" field
  • src/app/package.json: has "type": "module" (should be "commonjs")

Add or correct these fields in both files.

⛔ Skipped due to learnings
Learnt from: CR
PR: promptfoo/promptfoo#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-11T15:29:11.784Z
Learning: Applies to {package.json,src/app/package.json,site/package.json} : Use CommonJS modules (set "type": "commonjs" in package.json)

Comment on lines +11 to +20
constructor(info: EvalErrorInfo) {
super(info.message);
this.name = 'AppError';
this.type = info.type;
this.code = info.code;
this.hint = info.hint;
this.provider = info.provider;
this.requestId = info.requestId;
this.raw = info.raw;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix Error subclassing and propagate stack/cause.

Ensure correct instanceof and preserve diagnostic fields.

Apply:

 export class AppError extends Error {
   type: ErrorType;
   code?: string;
   hint?: string;
   provider?: string;
   requestId?: string;
   raw?: unknown;
+  // Node supports Error.cause; keep typed as unknown
+  cause?: unknown;

   constructor(info: EvalErrorInfo) {
-    super(info.message);
+    super(info.message);
+    // Ensure correct prototype chain for Error subclassing
+    Object.setPrototypeOf(this, new.target.prototype);
     this.name = 'AppError';
     this.type = info.type;
     this.code = info.code;
     this.hint = info.hint;
     this.provider = info.provider;
     this.requestId = info.requestId;
     this.raw = info.raw;
+    // Preserve diagnostics when provided
+    if (info.stack) this.stack = info.stack;
+    if (info.cause !== undefined) this.cause = info.cause;
   }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/lib/errors.ts around lines 11 to 20, the Error subclass constructor must
ensure proper instanceof behavior and propagate stack and cause; update the
constructor to call super(info.message, { cause: info.cause }) when available,
set this.name, assign diagnostic fields (type, code, hint, provider, requestId,
raw) as now, and explicitly restore the prototype with
Object.setPrototypeOf(this, new.target.prototype) (or AppError.prototype) so
instanceof works; if info.raw or info.raw.stack exists, copy that stack to
this.stack to preserve original stack trace.

Comment on lines 1 to 3
import { providerError } from './errors';
import type { EvalErrorInfo } from '../types/result';

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Import timeoutError for proper taxonomy.

Timeouts should map to type 'timeout', not 'provider_error'.

Apply:

-import { providerError } from './errors';
+import { providerError, timeoutError } from './errors';
-import type { EvalErrorInfo } from '../types/result';
+import type { EvalErrorInfo } from '../types/result';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { providerError } from './errors';
import type { EvalErrorInfo } from '../types/result';
import { providerError, timeoutError } from './errors';
import type { EvalErrorInfo } from '../types/result';
🤖 Prompt for AI Agents
In src/lib/guardrails around lines 1 to 3, the file imports providerError but
does not import timeoutError, so timeout conditions are being classified as
'provider_error' instead of 'timeout'; add an import for timeoutError from
'./errors' and update any places that currently map timeout conditions to
providerError to use timeoutError instead so timeouts are reported with the
'timeout' taxonomy.

Comment on lines 56 to 76
const aborted = controller.signal.aborted;
const retryable =
aborted ||
code === 'ECONNRESET' || code === 'ETIMEDOUT' ||
status === 429 || (status && status >= 500 && status < 600);

if (!retryable || attempt > retryBudget) {
clearTimeout(timeout);
// map final failure to provider_error
throw providerError(
`Provider request failed after ${attempt} attempt(s)`,
{
provider: providerName,
code: typeof code === 'string' ? code : String(code ?? status ?? 'UNKNOWN'),
raw: { status, code, message: err?.message },
hint: status === 429
? 'Hit rate limit. Lower concurrency or enable caching.'
: 'Check network/credentials or retry later.',
} as Partial<EvalErrorInfo>
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Classify final timeouts as timeoutError (not provider_error).

Aligns with the stated taxonomy and improves UX/metrics.

Apply:

-      const aborted = controller.signal.aborted;
+      const aborted = controller.signal.aborted;
+      const isTimeoutLike =
+        aborted ||
+        code === 'ETIMEDOUT' ||
+        code === 'UND_ERR_CONNECT_TIMEOUT' ||
+        anyErr?.name === 'AbortError';
       const retryable =
-        aborted ||
-        code === 'ECONNRESET' || code === 'ETIMEDOUT' ||
+        isTimeoutLike ||
+        code === 'ECONNRESET' || code === 'ECONNREFUSED' || code === 'EAI_AGAIN' || code === 'ENOTFOUND' ||
         status === 429 || (status && status >= 500 && status < 600);

       if (!retryable || attempt > retryBudget) {
-        clearTimeout(timeout);
-        // map final failure to provider_error
-        throw providerError(
-          `Provider request failed after ${attempt} attempt(s)`,
-          {
-            provider: providerName,
-            code: typeof code === 'string' ? code : String(code ?? status ?? 'UNKNOWN'),
-            raw: { status, code, message: err?.message },
-            hint: status === 429
-              ? 'Hit rate limit. Lower concurrency or enable caching.'
-              : 'Check network/credentials or retry later.',
-          } as Partial<EvalErrorInfo>
-        );
+        // Map final failure to the correct category
+        if (isTimeoutLike) {
+          throw timeoutError(
+            `Provider request timed out after ${attempt} attempt(s)`,
+            {
+              provider: providerName,
+              code: typeof code === 'string' ? code : String(code ?? status ?? 'TIMEOUT'),
+              raw: { status, code, message: (anyErr as any)?.message },
+              hint: 'Increase timeoutMs or reduce payload/concurrency.',
+            } as Partial<EvalErrorInfo>,
+          );
+        }
+        throw providerError(
+          `Provider request failed after ${attempt} attempt(s)`,
+          {
+            provider: providerName,
+            code: typeof code === 'string' ? code : String(code ?? status ?? 'UNKNOWN'),
+            raw: { status, code, message: (anyErr as any)?.message },
+            hint: status === 429
+              ? 'Hit rate limit. Lower concurrency or enable caching.'
+              : 'Check network/credentials or retry later.',
+          } as Partial<EvalErrorInfo>,
+        );
       }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 10 to 15
export interface EvalErrorInfo {
type: string;
message: string;
code?: string;
providerId?: string;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate and conflicting EvalErrorInfo — unify to a single source of truth.

This interface conflicts with src/types/result.ts (different fields: providerId vs provider, missing hint/requestId/raw/stack/cause). This will cause drift and accidental misuse.

Apply:

 export type ErrorCategory =
   | 'invalid_api_key'
   | 'quota_exceeded'
   | 'rate_limited'
   | 'not_found'
   | 'timeout'
   | 'internal'
   | 'network';

-export interface EvalErrorInfo {
-  type: string;
-  message: string;
-  code?: string;
-  providerId?: string;
-}
+// Re-export the canonical shape to avoid divergence
+export type { EvalErrorInfo } from './result';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface EvalErrorInfo {
type: string;
message: string;
code?: string;
providerId?: string;
}
export type ErrorCategory =
| 'invalid_api_key'
| 'quota_exceeded'
| 'rate_limited'
| 'not_found'
| 'timeout'
| 'internal'
| 'network';
// Re-export the canonical shape to avoid divergence
export type { EvalErrorInfo } from './result';
🤖 Prompt for AI Agents
In src/types/errors.ts around lines 10-15, the EvalErrorInfo interface
duplicates and conflicts with the one in src/types/result.ts (mismatched field
names and missing fields); remove this duplicate and unify to a single source of
truth by importing and re-exporting the canonical interface from
src/types/result.ts (or move the canonical definition into a common file used by
both modules), update all local references to use the unified type, and ensure
the final interface includes the superset of fields (e.g., type, message, code,
provider/providerId, hint, requestId, raw, stack, cause) so no callers break.

@@ -0,0 +1,88 @@
import { providerError } from './errors';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is missing an extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will thanks for catching that, I cleaned it up and it’s ready for another look whenever you get a chance

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, I do what I can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will ha, sorry about that. I meant to tag @will-holley. thanks for catching that, I cleaned it up and it’s ready for another look whenever you get a chance

@iitslamaa iitslamaa force-pushed the feat/error-taxonomy-provider-guardrails branch 2 times, most recently from 19a99f2 to 46498c5 Compare October 27, 2025 00:23
@iitslamaa iitslamaa force-pushed the feat/error-taxonomy-provider-guardrails branch from 46498c5 to d861896 Compare October 27, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants