Production-minded reference workflow for AI coding and support-ticket automation. It is intentionally small, but it demonstrates the engineering surface employers usually look for after the first headline: typed contracts, local RAG, safety gates, confidence scoring, structured output, evaluation, traceability, CI, and documentation.
This is not a chatbot demo. The workflow is deterministic so it can be tested without live model access, then swapped behind explicit agent/model interfaces in a production deployment.
src/Python package layout with Pydantic contracts- local Markdown knowledge base as the only authoritative source
- retrieval with exact error-code boosting
- classification, clarification, refusal, and resolution states
- prompt-injection and unsafe-request detection
- structured
TicketResolutionoutput - confidence calculation based on evidence, metadata, validation, and safety
- JSONL OpenTelemetry-style trace events
- regression eval cases and CI quality gates
SupportTicket
-> intake and safety screen
-> local KB retrieval
-> deterministic classifier
-> confidence and policy gate
-> schema-validated TicketResolution
-> JSONL trace events
The same seams map cleanly to LLM-backed agents later: prompts live in prompts/, workflow state is explicit, and final outputs are validated before any downstream system consumes them.
uv sync --locked --extra dev
uv run python scripts/run_demo.py
uv run python scripts/run_evals.pyExample output is a structured ticket resolution with category, priority, status, proposed solution or follow-up questions, retrieved sources, safety flags, and reasoning trace.
uv run ruff check .
uv run ruff format --check .
uv run mypy src
uv run pytest
uv run coverage run -m pytest
uv run coverage report
uv run python scripts/run_evals.pyThe GitHub Actions workflow runs the same gates on push and pull request.
This repository is a compact example of the same production AI patterns used in larger systems:
- governance: AgentGate
- incident agent orchestration: Incident Command Mesh
- retrieval evaluation: Metivta Eval
src/ai_coder_production_lab/: workflow, contracts, retrieval, safety, tracing, evalsknowledge_base/: local authoritative support knowledgeprompts/: versioned prompt assets and lifecycle notestests/: unit, workflow, tracing, and regression-eval testsdocs/: architecture, RAG, observability, threat model, and trade-offsschemas/v1/: versioned response contract for downstream clients.github/workflows/ci.yml: lint, format, type, test, coverage, and eval gates
Production AI coding systems fail less because of model choice than because of weak boundaries: untyped outputs, hidden state, ad hoc prompts, no evals, no policy gate, and no traceability. This lab keeps those concerns visible in a runnable codebase.