Skip to content

Docs: add RAG failure-mode eval checklist and starter examples #9110

@SidharthKriplani

Description

@SidharthKriplani

Hi maintainers, I’d like to propose a small docs/examples contribution for RAG evaluation.

Problem

Promptfoo already supports RAG evaluation patterns and assertions, but users new to RAG eval often struggle with a practical question:

“My RAG system is failing — which failure mode is this, and what Promptfoo test should I write for it?”

Proposed contribution

I’d like to add a compact RAG failure-mode checklist that maps common RAG failures to concrete Promptfoo eval scenarios, suggested assertions, and debugging hints.

Initial scope:

  1. Missing retrieved context
  2. Irrelevant retrieved context
  3. Retrieved context contains the answer but the model ignores it
  4. Answer overclaims beyond the provided context
  5. Fabricated citation/source
  6. Metadata/source not preserved
  7. Conflicting context not surfaced
  8. Refusal despite sufficient context

For each failure mode, I’d include:

  • what it looks like
  • why it matters
  • suggested Promptfoo assertion(s)
  • minimal YAML test case
  • short debugging/fix hint

Suggested format

I can keep this as a docs/examples-only contribution.

Possible locations:

  • site/docs/guides/rag-failure-modes.md
  • or examples/rag-failure-modes/README.md + promptfooconfig.yaml

I’m happy to align with the maintainers’ preferred location and naming.

Non-goals

  • No core Promptfoo changes
  • No new assertion types
  • No new RAG framework
  • No changes to existing evaluation semantics

The goal is only to make existing Promptfoo RAG evaluation capabilities easier to apply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions