Eliminating hallucinations. Enforcing consistency. Ensuring understanding.
A system that applies multi-layered verification to AI-generated documents — combining atomic fact decomposition, multi-perspective verification, evidence grounding, contradiction detection, and human comprehension testing into a unified confidence index.
Mohammad Asadolahi — Senior Agentic AI Engineer
- GitHub: https://github.com/MohammadAsadolahi
- Focus: Agentic AI Architectures In The Wild
Large Language Models generate fluent, convincing text — but they hallucinate. They fabricate citations, invent statistics, and produce internally contradictory statements that pass casual review. Worse, users often accept AI-generated documents without truly understanding their content, creating a comprehension gap that compounds the risk.
GenDoc Confirm addresses all three failure modes through a unified verification pipeline:
| Failure Mode | Detection Method | Research Basis |
|---|---|---|
| Hallucinated Facts | Atomic fact decomposition + multi-perspective verification | FActScore, SAFE |
| Internal Contradictions | Cross-reference analysis + cross-examination | CoVe |
| Blind Acceptance | Multi-level comprehension quiz with trap questions | Human-AI Collaboration |
|
Frontend
|
Backend
|
Docker Compose is provided with PostgreSQL 16, Redis 7, and ChromaDB services for future production use, but the current application implementation uses in-memory storage.
GenDoc Confirm implements a 9-node LangGraph pipeline with multi-perspective verification and selective cross-examination, informed by research in LLM verification.
+----------------------------------------------------------------------------------+
| DOCUMENT INGESTION |
| [1] Document Generation -- CoT-prompted generation from user specifications |
| [2] Fact Decomposition -- FActScore-inspired atomic claim extraction |
+----------------------------------------------------------------------------------+
| MULTI-LAYER VERIFICATION |
| [3] Evidence Grounding -- Batched grounding against user-provided refs |
| [4] Execute Verification -- Multi-perspective verification (1 call per fact) |
| [5] Cross-Examination -- Selective CoVe-lite for low-confidence facts |
| [6] Cross-Reference/Revise -- Contradiction detection + document revision |
+----------------------------------------------------------------------------------+
| COMPREHENSION ASSURANCE |
| [7] Quiz Generation -- Multi-level questions: recall, analysis, trap, |
| scenario |
| [8] Answer Evaluation -- Rule-based grading of user answers |
| [9] Final Report -- Composite scoring with risk areas & |
| recommendations |
+----------------------------------------------------------------------------------+
The pipeline applies several research-backed optimizations to reduce LLM calls while preserving verification quality:
- Batched evidence grounding: N calls reduced to 1 (RAGAS NLIStatementPrompt pattern)
- Unified multi-perspective verification: 3 perspectives in 1 call per fact (Self-Contrast, SPP research)
- Selective cross-examination: Only facts with confidence < 0.6 are cross-examined (CoVe Factored Lite)
- Merged cross-reference + revision: Single step combining contradiction detection and document correction
The final Confidence Index is a weighted composite of four independently measured dimensions:
- Python 3.11+ and Node.js 18+
- OpenAI API key (GPT-4o recommended; any OpenAI-compatible endpoint supported)
# Backend
cd backend
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install && npm run devAll settings configurable via environment variables or .env:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
— | LLM API key (required) |
OPENAI_MODEL |
gpt-4o |
Model identifier |
OPENAI_BASE_URL |
— | Custom endpoint (Azure, local, etc.) |
+------------+ +------------+ +------------+ +------------+
| INPUT |---->| VERIFY |---->| QUIZ |---->| REPORT |
| | | | | | | |
| Upload | | 9-step | | Multi- | | Overall |
| Paste | | pipeline | | level Qs | | CI score |
| Generate | | Real-time | | Graded | | Risk map |
| | | streaming | | answers | | Actions |
+------------+ +------------+ +------------+ +------------+
- Document Input — Upload a file, paste text, or generate via AI prompt
- Verification — Watch the 9-node pipeline process facts in real-time via SSE streaming
- Comprehension Quiz — Answer multi-level questions (recall, analysis, trap, scenario)
- Confidence Report — Review composite CI score, risk areas, and actionable recommendations
GenDoc Confirm is informed by research in LLM verification and self-improvement. The following papers shaped the design, though the final pipeline optimizes and selectively applies their techniques:
| Paper | Year | Influence on Design |
|---|---|---|
| FActScore — Min et al. | 2023 | Atomic fact decomposition for fine-grained evaluation (directly used) |
| SAFE — Wei et al., Google DeepMind | 2024 | Search-augmented factual evaluation with LLM agents (directly used) |
| Chain-of-Verification (CoVe) — Dhuliawala et al. | 2023 | Factored verification questions — applied in cross-examination (CoVe Lite) |
| RARR — Gao et al. | 2023 | Retrofit attribution via revision — merged into cross-reference step |
| SelfCheckGPT — Manakul et al. | 2023 | Multi-sample consistency concept — informed design, replaced by multi-perspective verification |
| Reflexion — Shinn et al. | 2023 | Verbal self-assessment concept — informed design, replaced by selective cross-examination |
POST /api/v1/projects/ # Create verification project
GET /api/v1/projects/ # List all projects
GET /api/v1/projects/{id} # Get project details
POST /api/v1/projects/{id}/document # Set document text or prompt
POST /api/v1/projects/{id}/upload # Upload document file
GET /api/v1/projects/{id}/document # Retrieve document
POST /api/v1/projects/{id}/verify # Start verification (SSE stream)
GET /api/v1/projects/{id}/verify/results # Get verification results
GET /api/v1/projects/{id}/quiz # Get quiz questions
POST /api/v1/projects/{id}/quiz/submit # Submit quiz answers
GET /api/v1/projects/{id}/report # Get full confidence report| Event | Payload | Description |
|---|---|---|
step_start |
{step, label} |
Pipeline node begins execution |
fact_verified |
{fact_id, status, confidence} |
Individual fact verification result |
step_complete |
{step, label} |
Pipeline node finishes |
verification_complete |
{message} |
Full pipeline complete |
# Run end-to-end tests (requires running backend)
cd tests
pytest test_e2e.py -v --timeout=300The E2E suite covers the complete lifecycle: project creation, document ingestion, SSE streaming verification, quiz generation, answer grading, and confidence report with all 4 scoring dimensions.
Doc-Confirm/
├── backend/
│ ├── .env.example
│ ├── requirements.txt
│ └── app/
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Pydantic settings
│ ├── store.py # In-memory project store
│ ├── agents/
│ │ └── graph.py # 9-node LangGraph pipeline
│ ├── api/
│ │ ├── documents.py # Document CRUD + file upload
│ │ ├── projects.py # Project management
│ │ ├── verification.py # SSE streaming verification
│ │ ├── quiz.py # Comprehension quiz
│ │ └── reports.py # Confidence reports
│ ├── models/
│ │ └── schemas.py # Pydantic data models
│ ├── services/
│ │ ├── llm.py # LLM provider abstraction
│ │ └── parallel.py # Parallel execution utilities
│ └── tools/
│ ├── fact_decomposer.py # FActScore decomposition
│ ├── web_search.py # Claim verification
│ ├── contradiction_detector.py
│ ├── cross_examiner.py # CoVe-lite cross-examination
│ ├── evidence_checker.py # Evidence grounding
│ └── quiz_generator.py # Multi-level quiz generation
├── frontend/
│ ├── package.json
│ ├── index.html
│ ├── tailwind.config.ts
│ └── src/
│ ├── App.tsx
│ ├── main.tsx
│ ├── components/layout/
│ │ └── MainLayout.tsx
│ ├── lib/
│ │ ├── api.ts # API client
│ │ └── types.ts # TypeScript types
│ └── pages/
│ ├── Dashboard.tsx
│ ├── DocumentInput.tsx
│ ├── Verification.tsx
│ ├── Quiz.tsx
│ └── Report.tsx
├── tests/
│ └── test_e2e.py # End-to-end test suite
├── docker-compose.yml # Service orchestration
└── README.md
- Multi-model ensemble — Cross-verify facts across GPT-4o, Claude, and Gemini
- Domain-specific verification — Specialized rules for medical, legal, and financial documents
- Knowledge graph integration — Structured fact storage with Neo4j
- Collaborative review — Multi-user verification workflows with role-based access
- CI/CD integration — Verify documentation in pull request pipelines
- Plugin ecosystem — Notion, Confluence, and Google Docs integrations
- Database integration — Connect PostgreSQL, Redis, and ChromaDB services to the application
Built with conviction that AI-generated content demands the same rigor we apply to human-authored work.
GenDoc Confirm — Trust, but verify.
this readme is AI assisted generated, so check for mistakes