Skip to content

Conversation

@mldangelo
Copy link
Member

Summary

Updates the openai-agents-basic example from a basic weather agent to an engaging D&D dungeon master adventure game, and switches to the latest gpt-5-nano model.

Changes

Example Theme

  • Before: Weather forecasting agent
  • After: D&D dungeon master running epic fantasy adventures

Model Update

  • Switch from gpt-4o-mini to gpt-5-nano
  • Increase maxTurns from 10 to 20 (required for complex combat scenarios with gpt-5-nano)

Tools (4 total)

  1. roll_dice - D&D dice rolling with modifiers, critical hit detection (natural 20s/1s)
  2. check_inventory - Equipment, items, and gold management
  3. check_character_stats - Full D&D 5e character sheet display
  4. describe_scene - Atmospheric location descriptions

Test Coverage (9 scenarios)

  • Dragon combat with attack rolls and initiative
  • Character stats checking
  • Inventory management
  • Scene descriptions (ancient crypt)
  • Saving throws (trap with natural 20)
  • Critical hits (natural 20 on goblin)
  • Edge cases (seducing dragon with interpretive dance)
  • Short rest mechanics
  • Magic item examination

Documentation

  • Updated README with D&D theme, tracing setup, and example interactions
  • Updated site/docs/providers/openai-agents.md with D&D examples
  • Added OpenAI Agents section to site/docs/providers/simulated-user.md

Bug Fixes

  • Fixed Zod schema issue: changed .optional() to .default('') for OpenAI Agents SDK compatibility

Test Results

100% pass rate - All 9 test cases passing with gpt-5-nano:

  • Duration: 6m 36s
  • Successes: 9/9
  • Failures: 0
  • Errors: 0

Why This Change?

The weather example was basic and didn't showcase the full capabilities of OpenAI Agents. The D&D dungeon master:

  • Demonstrates multi-turn agentic workflows (complex combat requiring 10+ turns)
  • Shows creative tool usage (dice mechanics, character management)
  • Provides engaging, memorable interactions
  • Better represents real-world agent use cases
  • More fun for users to try out

Breaking Changes

None - this is a self-contained example update.

@use-tusk
Copy link
Contributor

use-tusk bot commented Nov 5, 2025

⏩ No test execution environment matched (371e923) View output ↗


View check history

Commit Status Output Created (UTC)
4731882 ⏩ No test execution environment matched Output Nov 5, 2025 4:13AM
f69118f ⏩ No test execution environment matched Output Nov 5, 2025 4:18AM
4971dbd ⏩ No test execution environment matched Output Nov 5, 2025 4:24AM
904bf8d ⏩ No test execution environment matched Output Nov 5, 2025 4:30AM
371e923 ⏩ No test execution environment matched Output Nov 5, 2025 4:48AM

View output in GitHub ↗

@mldangelo mldangelo changed the title chore(examples): update openai-agents-basic example to D&D adventure with gpt-5-nano chore(examples): update openai-agents-basic example Nov 5, 2025
… gpt-5-nano

- Transform weather example to D&D adventure game
- Switch model from gpt-4o-mini to gpt-5-nano
- Increase maxTurns from 10 to 20 for complex combat scenarios
- Replace weather tools with D&D game mechanics:
  - roll_dice: D&D dice rolling with critical hit detection
  - check_inventory: Equipment and items management
  - check_character_stats: Full D&D 5e character sheet
  - describe_scene: Atmospheric location descriptions
- Add 9 comprehensive test cases including combat, saves, and edge cases
- Update documentation and README with D&D theme
- Fix Zod schema issue (change .optional() to .default(''))
@mldangelo mldangelo force-pushed the feat/openai-agents-gpt5-nano branch from 4731882 to f69118f Compare November 5, 2025 04:18
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

📝 Walkthrough

Walkthrough

The pull request replaces a weather-themed OpenAI Agents example with a D&D Dungeon Master example. It renames agent and tool modules (weather-agent.ts → dungeon-master-agent.ts, weather-tools.ts → game-tools.ts), removes the old weather modules, and adds a new game-tools module exposing four tools (rollDice, checkInventory, describeScene, checkCharacterStats). A new Dungeon Master agent (default export) references those tools and uses D&D-focused instructions. Tests and promptfooconfig.yaml were expanded and tightened (more scenarios, maxTurns increased to 20), README/package.json/docs were updated to reflect the D&D theme, and a changelog entry was added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify new tools in examples/openai-agents-basic/tools/game-tools.ts:
    • Parameter schemas (zod) and returned mock data shapes for rollDice, checkInventory, describeScene, checkCharacterStats.
    • rollDice logic (notation, totals, crit detection) and defaults.
    • Default export includes all tools in expected order.
  • Verify agent in examples/openai-agents-basic/agents/dungeon-master-agent.ts:
    • Agent construction, instructions consistency with tool capabilities, and model selection.
  • Validate tests and config in examples/openai-agents-basic/promptfooconfig.yaml:
    • New/updated test assertions (contains-any, javascript checks, dice/HP/gold expectations) and maxTurns increase.
  • Documentation and metadata updates:
    • README, site/docs/providers/openai-agents.md, site/docs/providers/simulated-user.md, examples/openai-agents-basic/package.json reflect new file paths, usage, and descriptions.
  • Cleanup check:
    • Confirm removed/emptied files (examples/.../agents/weather-agent.ts, examples/.../tools/weather-tools.ts) are no longer referenced or should be deleted rather than left empty.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and generic, using only 'update openai-agents-basic example' without specifying the main change (weather agent to D&D dungeon master theme). Revise title to be more specific, e.g., 'chore(examples): update openai-agents-basic to D&D dungeon master with gpt-5-nano' or similar to clearly indicate the primary theme transformation.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description comprehensively covers the changeset: theme transformation (weather to D&D), model upgrade (gpt-4o-mini to gpt-5-nano), tool updates, test scenarios, documentation changes, and bug fixes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/openai-agents-gpt5-nano

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
examples/openai-agents-basic/README.md (1)

81-81: Add language specifiers to code blocks.

Several code blocks are missing language specifiers for syntax highlighting. While the content reads as dialogue/output, explicitly marking them improves rendering consistency.

Add language identifiers (e.g., text) to the code blocks at lines 81, 290, 299, 307, and 318 to address the markdownlint warnings.

Also applies to: 290-290, 299-299, 307-307, 318-318

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 025a717 and 4971dbd.

📒 Files selected for processing (10)
  • CHANGELOG.md (1 hunks)
  • examples/openai-agents-basic/README.md (2 hunks)
  • examples/openai-agents-basic/agents/dungeon-master-agent.ts (1 hunks)
  • examples/openai-agents-basic/agents/weather-agent.ts (0 hunks)
  • examples/openai-agents-basic/package.json (1 hunks)
  • examples/openai-agents-basic/promptfooconfig.yaml (1 hunks)
  • examples/openai-agents-basic/tools/game-tools.ts (1 hunks)
  • examples/openai-agents-basic/tools/weather-tools.ts (0 hunks)
  • site/docs/providers/openai-agents.md (3 hunks)
  • site/docs/providers/simulated-user.md (2 hunks)
💤 Files with no reviewable changes (2)
  • examples/openai-agents-basic/agents/weather-agent.ts
  • examples/openai-agents-basic/tools/weather-tools.ts
🧰 Additional context used
📓 Path-based instructions (13)
{site,examples}/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Documentation-only changes (touching only site/ or examples/) must use docs: prefix in PR title

Files:

  • examples/openai-agents-basic/package.json
  • examples/openai-agents-basic/agents/dungeon-master-agent.ts
  • examples/openai-agents-basic/tools/game-tools.ts
  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
  • examples/openai-agents-basic/README.md
  • examples/openai-agents-basic/promptfooconfig.yaml
examples/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

When modifying examples, update existing files instead of adding new ones (e.g., replace outdated model IDs)

Files:

  • examples/openai-agents-basic/package.json
  • examples/openai-agents-basic/agents/dungeon-master-agent.ts
  • examples/openai-agents-basic/tools/game-tools.ts
  • examples/openai-agents-basic/README.md
  • examples/openai-agents-basic/promptfooconfig.yaml
**/package.json

📄 CodeRabbit inference engine (CLAUDE.md)

**/package.json: Ensure peerDependencies versions match devDependencies versions when applicable
Use CommonJS modules ("type": "commonjs") in package.json

Files:

  • examples/openai-agents-basic/package.json
examples/**/package.json

📄 CodeRabbit inference engine (CLAUDE.md)

Keep examples/ package.json files up to date when updating dependencies

Files:

  • examples/openai-agents-basic/package.json
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; reuse existing interfaces where possible

**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts

Files:

  • examples/openai-agents-basic/agents/dungeon-master-agent.ts
  • examples/openai-agents-basic/tools/game-tools.ts
CHANGELOG.md

📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change

CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...

Files:

  • CHANGELOG.md
site/docs/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/docusaurus.mdc)

site/docs/**/*.md: Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Structure content to reveal information progressively: begin with essential actions and information, then provide deeper context as necessary; organize information from most important to least important.
Use action-oriented language: clearly outline actionable steps users should take, use concise and direct language, prefer active voice over passive voice, and use imperative mood for instructions.
Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
The project name can be written as either 'Promptfoo' (capitalized) or 'promptfoo' (lowercase) depending on context: use 'Promptfoo' at the beginning of sentences or in headings, and 'promptfoo' in code examples, terminal commands, or when referring to the package name; be consistent with the chosen capitalization within each document or section.
Each markdown documentation file must include required front matter fields: 'title' (the page title shown in search results and browser tabs) and 'description' (a concise summary of the page content, ideally 150-160 characters).
Only add a title attribute to code blocks that represent complete, runnable files; do not add titles to code fragments, partial examples, or snippets that aren't meant to be used as standalone files; this applies to all code blocks regardless of language.
Use special comment directives to highlight specific lines in code blocks: 'highlight-next-line' highlights the line immediately after the comment, 'highligh...

Files:

  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
site/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

For feature changes, update relevant documentation under site/

Files:

  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
examples/*/README.md

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/README.md: The README.md must begin with the folder name as an H1 heading
Every example README must include instructions on how to run it with 'npx promptfoo@latest init --example example-name'
Include a comprehensive README.md that explains the purpose, prerequisites, instructions, and expected outputs for the example
Document any model-specific capabilities or limitations in examples
Clearly list all required environment variables at the beginning of the README
For each environment variable, explain its purpose, how to obtain it, and any default values or constraints in the README
Include a sample .env file or instructions when multiple environment variables are needed in the README
Document any required API keys or credentials in the README
Provide instructions for cleaning up resources after running the example in the README
When creating examples for specific providers, explain any provider-specific configuration in the README
When creating examples for specific providers, document required environment variables in the README
When creating examples for specific providers, include information about pricing or usage limits in the README
When creating examples for specific providers, highlight unique features or capabilities in the README
When creating examples for specific providers, compare to similar providers where appropriate in the README

Files:

  • examples/openai-agents-basic/README.md
examples/*/{README.md,promptfooconfig.yaml}

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

Include placeholder values for secrets/credentials in the README or configuration files

Files:

  • examples/openai-agents-basic/README.md
  • examples/openai-agents-basic/promptfooconfig.yaml
examples/**/README.md

📄 CodeRabbit inference engine (examples/CLAUDE.md)

examples/**/README.md: Each example must include a README.md that begins with a first-level heading: "# folder-name (Human Readable Name)"
README.md must include instructions showing: npx promptfoo@latest init --example

Each example must include a clear README.md in its directory

Each example should include a clear README.md

Files:

  • examples/openai-agents-basic/README.md
examples/*/promptfooconfig.yaml

📄 CodeRabbit inference engine (.cursor/rules/examples.mdc)

examples/*/promptfooconfig.yaml: Include a working promptfooconfig.yaml (or equivalent) file in each example directory
Always include the YAML schema reference at the top of configuration files: '# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json'
Follow the specified field order in all configuration files: description, env (optional), prompts, providers, defaultTest (optional), scenarios (optional), tests
Ensure all configuration files pass YAML lint validation
When referencing external files in configuration, always use the 'file://' prefix
Always use the latest model versions available in 2025 in configuration files
For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
For Anthropic, prefer models like 'anthropic:claude-3-7-sonnet-20250219' in configuration files
For open-source models, use the latest versions available (e.g., latest Llama) in configuration files
Include a mix of providers when comparing model performance in configuration files
When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files
Format configuration files consistently
When creating examples for specific providers, always use the latest available model versions for that provider in configuration files
Update model versions when new ones become available in configuration files

Files:

  • examples/openai-agents-basic/promptfooconfig.yaml
examples/**/promptfooconfig.yaml

📄 CodeRabbit inference engine (examples/CLAUDE.md)

examples/**/promptfooconfig.yaml: Each example must include a promptfooconfig.yaml that contains a schema reference
promptfooconfig.yaml must follow the strict field order: 1) description, 2) env (optional), 3) prompts, 4) providers, 5) defaultTest (optional), 6) scenarios (optional), 7) tests
Use latest models in providers (e.g., openai:gpt-5, anthropic:claude-sonnet-4-5-20250929)
Use the file:// prefix for external file references in the configuration
Keep the description field short (3–10 words)

Files:

  • examples/openai-agents-basic/promptfooconfig.yaml
🧠 Learnings (17)
📚 Learning: 2025-10-05T16:54:57.986Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: examples/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:54:57.986Z
Learning: Applies to examples/**/README.md : README.md must include instructions showing: npx promptfoolatest init --example <name>

Applied to files:

  • examples/openai-agents-basic/package.json
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/README.md : Every example README must include instructions on how to run it with 'npx promptfoolatest init --example example-name'

Applied to files:

  • examples/openai-agents-basic/package.json
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Document all user-facing changes in CHANGELOG.md

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:41:44.088Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-10-24T22:41:44.088Z
Learning: Applies to CHANGELOG.md : Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to CHANGELOG.md : Keep entries concise, one line, user-focused (what changed, not how) under the correct category

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to CHANGELOG.md : Use standardized scopes: providers, webui, cli, assertions, api, config, deps, docs, tests, examples, redteam, site

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Add new entries under the 'Unreleased' section

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-05T16:54:57.986Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: examples/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:54:57.986Z
Learning: Applies to examples/**/promptfooconfig.yaml : Use latest models in providers (e.g., openai:gpt-5, anthropic:claude-sonnet-4-5-20250929)

Applied to files:

  • CHANGELOG.md
  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files

Applied to files:

  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files

Applied to files:

  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : Include a mix of providers when comparing model performance in configuration files

Applied to files:

  • site/docs/providers/simulated-user.md
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : Include a mix of providers when comparing model performance in configuration files

Applied to files:

  • site/docs/providers/simulated-user.md
📚 Learning: 2025-10-05T16:58:47.598Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:58:47.598Z
Learning: Applies to src/providers/**/*.ts : For OpenAI-compatible providers, extend OpenAiChatCompletionProvider and configure apiBaseUrl and options via super(...)

Applied to files:

  • site/docs/providers/simulated-user.md
  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:46.665Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files

Applied to files:

  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:38.445Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.445Z
Learning: Applies to examples/*/promptfooconfig*.yaml : When demonstrating specialized capabilities (vision, audio, etc.), use models that support those features in configuration files

Applied to files:

  • site/docs/providers/openai-agents.md
📚 Learning: 2025-07-18T17:25:38.444Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For trivial test cases in configuration, make them quirky and fun to increase engagement

Applied to files:

  • examples/openai-agents-basic/promptfooconfig.yaml
🪛 markdownlint-cli2 (0.18.1)
examples/openai-agents-basic/README.md

81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


290-290: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


299-299: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


307-307: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


318-318: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Share Test
  • GitHub Check: Redteam (Production API)
  • GitHub Check: webui tests
  • GitHub Check: Build Docs
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
examples/openai-agents-basic/tools/game-tools.ts (1)

1-254: LGTM! Well-structured D&D game tools.

The four tools are well-implemented with proper Zod validation, comprehensive mock data, and clear functionality. The dice rolling logic correctly handles critical hits, and the character/inventory data provides realistic D&D 5e content.

site/docs/providers/simulated-user.md (2)

274-279: Model update looks good.

The change from openai:gpt-4.1-mini to openai:gpt-4o-mini aligns with the coding guidelines to use current model versions.


284-314: Helpful addition for OpenAI Agents testing.

The new section clearly demonstrates how to use the Simulated User Provider with OpenAI Agents, using the D&D dungeon master example with proper file-based configuration.

examples/openai-agents-basic/package.json (1)

5-5: Description accurately reflects the D&D theme.

The updated description clearly communicates the example's purpose.

examples/openai-agents-basic/promptfooconfig.yaml (2)

2-13: Configuration is well-structured.

The file-based approach with proper file:// prefixes and increased maxTurns: 20 appropriately supports complex multi-turn D&D scenarios.


16-105: Comprehensive test coverage for D&D scenarios.

The test suite covers a wide range of gameplay situations including combat, stats checks, inventory management, scene descriptions, and edge cases. The assertions properly validate both dice mechanics and narrative quality.

site/docs/providers/openai-agents.md (1)

140-177: Clear D&D example with proper file-based configuration.

The Dungeon Master example effectively demonstrates multi-turn agentic workflows with proper file references and comprehensive test scenarios.

examples/openai-agents-basic/README.md (1)

109-109: Model inconsistency between README and actual agent.

The README example shows model: 'gpt-4o-mini', but the actual agent file (agents/dungeon-master-agent.ts) uses model: 'gpt-5-nano'. Since gpt-5-nano is not a valid OpenAI model, both should be updated to use gpt-4o-mini.

⛔ Skipped due to learnings
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:38.444Z
Learning: Applies to examples/*/promptfooconfig*.yaml : For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/examples.mdc:0-0
Timestamp: 2025-07-18T17:25:46.665Z
Learning: Applies to examples/*/promptfooconfig.yaml : For OpenAI, prefer models like 'openai:o3-mini' and 'openai:gpt-4o-mini' in configuration files
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: examples/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:54:57.986Z
Learning: Applies to examples/**/promptfooconfig.yaml : Use latest models in providers (e.g., openai:gpt-5, anthropic:claude-sonnet-4-5-20250929)

CHANGELOG.md Outdated

### Changed

- chore(examples): update openai-agents-basic example from weather to D&D dungeon master with gpt-5-nano, comprehensive D&D 5e tools (dice rolling, character stats, inventory), and 9 test scenarios (#6114)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing Documentation, Tests, and Fixed entries for #6114; keep entries concise.

The example update entry under Changed looks fine. Per changelog guidelines, please also add:

  • Tests entry for the 9 new scenarios.
  • Documentation entry for README + site docs updates.
  • Fixed entry for the Zod schema change.

Apply the following minimal additions:

@@
 ### Fixed
@@
+- fix(examples): change Zod schema from .optional() to .default('') for OpenAI Agents SDK compatibility (#6114)
@@
 ### Tests
@@
+- test(examples): add 9 D&D agent scenarios for openai-agents-basic (combat, stats, inventory, scenes, saves, crits, edge cases, short rest, magic item) (#6114)
@@
 ### Documentation
@@
+- docs(examples): refresh openai-agents-basic README (D&D theme, tracing, interactions); update site docs (providers/openai-agents.md) and add OpenAI Agents section to simulated-user docs (#6114)

If you prefer to shorten the Changed line for concision, consider:

-- chore(examples): update openai-agents-basic example from weather to D&D dungeon master with gpt-5-nano, comprehensive D&D 5e tools (dice rolling, character stats, inventory), and 9 test scenarios (#6114)
+- chore(examples): switch openai-agents-basic to D&D dungeon master (gpt-5-nano, maxTurns 20, tools: dice, stats, inventory, scene) (#6114)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In CHANGELOG.md around line 38, the "chore(examples)..." Changed entry is
missing accompanying concise entries required by changelog guidelines: add a
Tests entry noting "add 9 test scenarios for openai-agents-basic example", a
Documentation entry noting "update README and site docs for D&D dungeon master
example", and a Fixed entry noting "fix Zod schema change related to example"
directly under the Changed section; keep all entries one-line and concise (you
may also shorten the existing Changed line for brevity if desired).

Keep responses punchy but atmospheric. Always roll dice for uncertain outcomes.
Reference character abilities and inventory items when relevant. Make every moment memorable!`,
model: 'gpt-5-nano',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Invalid model identifier will cause runtime errors.

The model gpt-5-nano does not exist in OpenAI's model lineup. This will cause the agent to fail at runtime.

Apply this diff to use a valid model:

-  model: 'gpt-5-nano',
+  model: 'gpt-4o-mini',

Based on coding guidelines, prefer using gpt-4o-mini or o3-mini for OpenAI examples. As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model: 'gpt-5-nano',
model: 'gpt-4o-mini',
🤖 Prompt for AI Agents
In examples/openai-agents-basic/agents/dungeon-master-agent.ts around line 29,
the model field is set to the invalid identifier "gpt-5-nano"; replace it with a
valid model (prefer "gpt-4o-mini" per coding guidelines, or "o3-mini" as an
alternative) so the agent uses a supported OpenAI model and won't fail at
runtime.

sides: z.number(),
count: z.number().default(1),
modifier: z.number().default(0),
purpose: z.string().optional(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update Zod schema example to match implementation.

The example shows purpose: z.string().optional(), but the actual implementation in game-tools.ts uses .default('') instead. The PR summary mentions this as a bug fix for OpenAI Agents SDK compatibility.

Apply this diff:

-    purpose: z.string().optional(),
+    purpose: z.string().default(''),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
purpose: z.string().optional(),
purpose: z.string().default(''),
🤖 Prompt for AI Agents
In examples/openai-agents-basic/README.md around line 128, the Zod schema
example incorrectly shows "purpose: z.string().optional()", but the
implementation in game-tools.ts uses ".default('')"; update the README example
to match the implementation by changing that line to use "purpose:
z.string().default('')" so the docs reflect the bug fix for OpenAI Agents SDK
compatibility.

}
export default new Agent({
name: 'Support Agent',
model: 'gpt-5-mini',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Invalid model identifier in documentation.

The model gpt-5-mini does not exist in OpenAI's lineup. Use a valid model like gpt-4o-mini or o3-mini instead.

Apply this diff:

-  model: 'gpt-5-mini',
+  model: 'gpt-4o-mini',

As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model: 'gpt-5-mini',
model: 'gpt-4o-mini',
🤖 Prompt for AI Agents
In site/docs/providers/openai-agents.md around line 73, the example uses an
invalid model identifier "gpt-5-mini"; replace it with a valid OpenAI model (for
example "gpt-4o-mini" or "o3-mini") to correct the documentation. Update the
model field value accordingly and ensure consistency with surrounding examples
and any notes about model capabilities.

…ample in README, add complete changelog entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants