AI Coder Production Lab

Production-minded reference workflow for AI coding and support-ticket automation. It is intentionally small, but it demonstrates the engineering surface employers usually look for after the first headline: typed contracts, local RAG, safety gates, confidence scoring, structured output, evaluation, traceability, CI, and documentation.

This is not a chatbot demo. The workflow is deterministic so it can be tested without live model access, then swapped behind explicit agent/model interfaces in a production deployment.

What It Shows

src/ Python package layout with Pydantic contracts
local Markdown knowledge base as the only authoritative source
retrieval with exact error-code boosting
classification, clarification, refusal, and resolution states
prompt-injection and unsafe-request detection
structured TicketResolution output
confidence calculation based on evidence, metadata, validation, and safety
JSONL OpenTelemetry-style trace events
regression eval cases and CI quality gates

Architecture

SupportTicket
  -> intake and safety screen
  -> local KB retrieval
  -> deterministic classifier
  -> confidence and policy gate
  -> schema-validated TicketResolution
  -> JSONL trace events

The same seams map cleanly to LLM-backed agents later: prompts live in prompts/, workflow state is explicit, and final outputs are validated before any downstream system consumes them.

Demo

uv sync --locked --extra dev
uv run python scripts/run_demo.py
uv run python scripts/run_evals.py

Example output is a structured ticket resolution with category, priority, status, proposed solution or follow-up questions, retrieved sources, safety flags, and reasoning trace.

Quality Gates

uv run ruff check .
uv run ruff format --check .
uv run mypy src
uv run pytest
uv run coverage run -m pytest
uv run coverage report
uv run python scripts/run_evals.py

The GitHub Actions workflow runs the same gates on push and pull request.

Case Study

This repository is a compact example of the same production AI patterns used in larger systems:

governance: AgentGate
incident agent orchestration: Incident Command Mesh
retrieval evaluation: Metivta Eval

Repository Map

src/ai_coder_production_lab/: workflow, contracts, retrieval, safety, tracing, evals
knowledge_base/: local authoritative support knowledge
prompts/: versioned prompt assets and lifecycle notes
tests/: unit, workflow, tracing, and regression-eval tests
docs/: architecture, RAG, observability, threat model, and trade-offs
schemas/v1/: versioned response contract for downstream clients
.github/workflows/ci.yml: lint, format, type, test, coverage, and eval gates

Why This Matters

Production AI coding systems fail less because of model choice than because of weak boundaries: untyped outputs, hidden state, ad hoc prompts, no evals, no policy gate, and no traceability. This lab keeps those concerns visible in a runnable codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
config		config
docs		docs
knowledge_base		knowledge_base
prompts		prompts
schemas/v1		schemas/v1
scripts		scripts
src/ai_coder_production_lab		src/ai_coder_production_lab
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Coder Production Lab

What It Shows

Architecture

Demo

Quality Gates

Case Study

Repository Map

Why This Matters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Coder Production Lab

What It Shows

Architecture

Demo

Quality Gates

Case Study

Repository Map

Why This Matters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages