Skip to content

Digication/knowledge-network

 
 

Repository files navigation

Knowledge Network

A federated knowledge system served through an MCP (Model Context Protocol) server. Multiple GitHub repos contribute structured markdown knowledge files. A central MCP server indexes all sources into PostgreSQL + pgvector, and exposes the merged knowledge to any LLM tool that supports MCP.

Beyond search and retrieval, the Knowledge Network includes a knowledge lifecycle system — tools for onboarding, capturing knowledge from any source, automated quality review, and self-improvement loops that let knowledge grow organically from daily work.

Architecture

MCP Clients (claude.ai, Claude Desktop, Claude Code, Cursor, etc.)
        |
        | Streamable HTTP + OAuth 2.1 / PAT auth
        v
  MCP Server (Node.js / TypeScript)
  ├── get_knowledge(query)        — hybrid semantic + keyword search
  ├── get_alignment_context(text) — find relevant company values/stances
  ├── get_preferences(context)    — user preferences from GitHub
  ├── list_sources()              — available knowledge repos
  ├── onboard()                   — first-time setup + system instructions (planned)
  └── capture_knowledge(content)  — format and route knowledge contributions (planned)
        |                    |
  PostgreSQL + pgvector    GitHub API + OAuth
  (chunks, embeddings,     (identity, file fetch,
   sessions, analytics)     access control)

How indexing works

  1. A PR merges to main in a knowledge repo
  2. A GitHub Action detects changed .md files
  3. The Action calls POST /api/index with the file list
  4. The server fetches each file, extracts frontmatter, splits by ## sections
  5. Each section is embedded via Voyage AI (1024-dim vectors)
  6. Chunks are upserted into pgvector with deterministic IDs

How search works

  1. An MCP client calls get_knowledge("FERPA compliance")
  2. The query is embedded via Voyage AI
  3. Two searches run in parallel: pgvector cosine similarity + PostgreSQL full-text search
  4. Results are merged using Reciprocal Rank Fusion (RRF)
  5. Top results are returned with full content and metadata

Tech stack

Layer Technology
Language TypeScript (strict mode, ESM)
Runtime Node.js 22
Package manager pnpm
MCP SDK @modelcontextprotocol/sdk
HTTP framework Express 5
Database PostgreSQL 17 + pgvector
Embeddings Voyage AI voyage-3 (1024 dimensions)
DB client pg (no ORM)
Caching lru-cache (response + preferences)
Testing Vitest
Validation Zod
Markdown parsing gray-matter + remark

Prerequisites

  • Docker Desktop or OrbStack
  • pnpm (corepack enable && corepack prepare pnpm@latest --activate)
  • Caddy reverse proxy running on the host (see Caddy setup below)

Getting started

1. Clone and install

git clone <repo-url>
cd knowledge-network
pnpm install

2. Configure environment

cp .env.example .env

Edit .env and fill in your API keys:

Variable Where to get it
VOYAGE_API_KEY dash.voyageai.com — sign up and create an API key
GITHUB_TOKEN github.com/settings/tokens — classic token with repo (read) and read:org scopes
WEBHOOK_SECRET Any random string (secures the indexing webhook)
GITHUB_CLIENT_ID GitHub OAuth App — see docs/guides/oauth-setup.md
GITHUB_CLIENT_SECRET GitHub OAuth App — see docs/guides/oauth-setup.md
TOKEN_ENCRYPTION_KEY Run: node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

The other variables (DATABASE_URL, MANIFEST_PATH, PORT, NODE_ENV, OAUTH_ISSUER, etc.) have working defaults for local development. See .env.example for the full list.

3. Start the dev environment

docker compose up -d

This starts two containers:

  • knowledge-network-dev — Node.js 22 app server (auto-installs dependencies, runs migrations on startup)
  • knowledge-network-db — PostgreSQL 17 with pgvector

The app is available at https://knowledge-network.localhost (via Caddy).

First startup takes a minute or two while it installs dependencies inside the container. Check progress with:

docker compose logs app --tail 20

You should see Knowledge Network MCP server listening on port 4000 when it's ready.

4. Seed the knowledge base

docker compose exec app pnpm seed

This indexes the 6 seed documents in seed/company/ into the database. Requires a valid VOYAGE_API_KEY.

5. Verify everything works

# Health check
curl -sk https://knowledge-network.localhost/health

# Run unit tests
docker compose exec app pnpm test

Project structure

src/
├── index.ts                  # MCP server + Express entry point
├── types.ts                  # Shared TypeScript types (Chunk, Frontmatter, AuthContext, etc.)
├── manifest.ts               # Repo manifest loader (repos.yaml)
├── seed.ts                   # Seeds the database from seed/ files
├── db/
│   ├── schema.sql            # PostgreSQL schema (pgvector, chunks, repos, sessions, etc.)
│   ├── connection.ts         # Database connection pool
│   ├── migrate.ts            # Schema migration runner
│   └── migrations/           # Incremental SQL migrations
├── indexing/
│   ├── parser.ts             # Markdown frontmatter extraction (gray-matter + Zod)
│   ├── chunker.ts            # Splits documents by ## sections
│   ├── embedder.ts           # Voyage AI embedding client
│   └── indexer.ts            # Orchestrator: parse -> chunk -> embed -> upsert
├── search/
│   ├── hybrid-search.ts      # pgvector + tsvector with RRF merging
│   └── cache.ts              # LRU response cache (1hr TTL)
├── tools/
│   ├── get-knowledge.ts      # Search the knowledge base
│   ├── get-alignment-context.ts  # Find relevant company values
│   ├── get-preferences.ts    # Fetch user preferences from GitHub
│   └── list-sources.ts       # List available knowledge repos
├── auth/
│   ├── github.ts             # GitHub token verification + access control
│   ├── middleware.ts          # Unified auth middleware (OAuth + PAT)
│   └── crypto.ts             # Token hashing + encryption helpers
├── oauth/
│   ├── metadata.ts           # .well-known discovery endpoints
│   ├── register.ts           # Dynamic client registration
│   ├── authorize.ts          # OAuth authorize (redirects to GitHub)
│   ├── callback.ts           # GitHub OAuth callback handler
│   ├── token.ts              # Token exchange and refresh
│   └── revoke.ts             # Token revocation
├── admin/
│   ├── repos.ts              # Repo listing and re-indexing
│   ├── cache.ts              # Cache management
│   ├── sessions.ts           # Session listing and revocation
│   └── analytics.ts          # Usage analytics and query log
├── webhook/
│   └── index-handler.ts      # POST /api/index — HMAC-secured webhook
├── telemetry/
│   ├── logger.ts             # Query telemetry logging
│   └── structured-logger.ts  # Structured JSON logging (pino)
├── validation/
│   └── validate-frontmatter.ts   # Frontmatter validation script
└── __tests__/                # Unit tests (~89 tests)
    ├── parser.test.ts
    ├── chunker.test.ts
    ├── embedder.test.ts
    ├── manifest.test.ts
    ├── cache.test.ts
    ├── webhook.test.ts
    ├── tool-formatting.test.ts
    ├── validation.test.ts
    ├── middleware.test.ts
    ├── crypto.test.ts
    ├── oauth-register.test.ts
    ├── oauth-token.test.ts
    ├── oauth-revoke.test.ts
    ├── admin.test.ts
    ├── analytics.test.ts
    ├── query-preprocessing.test.ts
    ├── recursive-chunking.test.ts
    └── fixtures/             # Test fixtures (sample markdown files)

docs/
├── strategy/
│   ├── vision-and-problem-statement.md
│   ├── user-research-and-personas.md
│   ├── jobs-to-be-done.md
│   ├── business-case.md
│   └── product-strategy-and-roadmap.md
├── requirements/
│   ├── PRD.md
│   └── technical-review-2026-03-31.md
├── guides/
│   ├── oauth-setup.md
│   ├── admin-guide.md
│   └── content-schema.md
└── implementation-plan/

seed/                         # Seed knowledge documents
├── repos.yaml                # Repo manifest
└── company/                  # Company knowledge files
    ├── mission-values.md
    ├── positioning.md
    ├── ethical-stances.md
    ├── higher-ed-landscape.md
    └── compliance/
        ├── ferpa.md
        └── accessibility.md

Common commands

All commands run inside Docker unless noted otherwise.

Command What it does
docker compose up -d Start the dev environment
docker compose down Stop all containers
docker compose logs app --tail 30 View app logs
docker compose exec app pnpm test Run unit tests (~89 tests)
docker compose exec app pnpm seed Index seed documents into the database
docker compose exec app pnpm migrate Run database migrations
docker compose exec app pnpm validate seed Validate frontmatter in seed files
docker compose exec app pnpm typecheck Run TypeScript type checking

Do not run pnpm dev directly on the host. The app requires Node 22 and a PostgreSQL database, both of which are provided by Docker.

Knowledge lifecycle

The Knowledge Network isn't just a place to look things up — it's a living system where knowledge flows in from many sources, passes through quality gates, and becomes shared organizational intelligence.

How knowledge flows in

People on the team build their own creative systems for discovering useful knowledge — monitoring CI/CD pipelines, scanning meeting transcripts, comparing AI-drafted emails to what they actually send, running retrospectives on development decisions. The Knowledge Network doesn't standardize how people discover insights. It standardizes the last mile: once you have something worth capturing, the capture_knowledge MCP tool formats it, applies company terminology, and routes it to the right place.

Quality gates

Captured knowledge passes through two review layers:

  1. Automated critic — Checks length (optimized for token efficiency when served via MCP), terminology consistency (e.g., "customers" not "clients", "faculty members" not "teachers"), format compliance, and contradiction detection against existing knowledge.
  2. Human review — Domain experts review and approve contributions before they merge into the shared knowledge base.

Onboarding

The onboard MCP tool introduces new users to the Knowledge Network and writes system instructions to their AI assistant's config file (CLAUDE.md, .cursorrules, etc.) so that Digication-related questions are automatically routed to the MCP instead of relying on the LLM's general training data.

Multi-repo architecture

Knowledge is organized into multiple repos by function — company-wide values, sales, engineering, HR, etc. — each maintained by the team that knows the domain best. Personal preference repos give individuals a private space for communication style and working preferences, with clear boundaries between personal content and company IP.

Self-improvement loops

The system is designed to support self-improvement loops where AI agents learn from their own output and from human corrections. Examples include comparing draft emails to sent versions, extracting patterns from meeting transcripts, and capturing engineering insights from CI/CD results. A patterns library documents reusable setups that people can adapt.

For the full lifecycle design, see docs/implementation-plan/knowledge-network-lifecycle/.

Authentication

The MCP server supports two authentication methods:

  1. OAuth 2.1 (primary) — Used by claude.ai, Claude Desktop, CoWork, and other browser-based MCP clients. Users log in via GitHub and are connected automatically. No tokens to copy.
  2. GitHub PAT (backward compatible) — Used by Claude Code, Cursor, and CLI-based tools. Requires a personal access token passed in the Authorization header.

Both methods verify the user's identity through GitHub and determine which knowledge repos they can access based on their GitHub permissions.

Connecting from claude.ai and CoWork

Claude.ai and CoWork use the OAuth flow. Users don't need to manage tokens -- they authorize once through GitHub and they're connected.

  1. Go to claude.ai > Settings > Connectors > Add custom connector
  2. Enter the server URL: https://knowledge-network.up.railway.app/mcp
  3. Claude discovers the OAuth endpoints automatically and prompts you to authorize
  4. Click "Authorize" and log in with your GitHub account
  5. You're connected! Try asking: "What is Digication's mission?"

Claude Desktop and CoWork work the same way -- add the server URL in MCP settings, and OAuth handles the rest.

For detailed setup instructions (creating OAuth Apps, environment variables, troubleshooting), see docs/guides/oauth-setup.md.

Connecting Claude Code

Claude Code uses a GitHub PAT (personal access token) for authentication:

  1. Create a token at github.com/settings/tokens with repo (read) and read:org scopes
  2. Add the MCP server:
claude mcp add knowledge-network https://knowledge-network.up.railway.app/mcp --transport http --header "Authorization: Bearer YOUR_GITHUB_TOKEN"

Or add it manually to ~/.claude.json:

{
  "mcpServers": {
    "knowledge-network": {
      "type": "http",
      "url": "https://knowledge-network.up.railway.app/mcp",
      "headers": {
        "Authorization": "Bearer ${GITHUB_TOKEN}"
      }
    }
  }
}

If using the ${GITHUB_TOKEN} environment variable approach, add this to your ~/.zshrc (or ~/.bashrc):

export GITHUB_TOKEN=your-token-here

After setup, restart Claude Code. You can test by asking: "What is Digication's FERPA compliance policy?"

Connecting other MCP clients (Cursor, etc.)

Any MCP client that supports Streamable HTTP can connect. Point it at:

  • URL: https://knowledge-network.up.railway.app/mcp
  • Auth header: Authorization: Bearer <your-github-pat>

Admin access

Admin endpoints (analytics, session management, re-indexing) require membership in a GitHub team. See docs/guides/admin-guide.md for setup and usage.

Endpoints

Endpoint Method Auth Description
/health GET None Health check — returns status, version, repo count
/.well-known/oauth-protected-resource GET None OAuth resource metadata (discovery)
/.well-known/oauth-authorization-server GET None OAuth server metadata (discovery)
/oauth/register POST None Dynamic client registration
/oauth/authorize GET None Start OAuth flow (redirects to GitHub)
/oauth/callback GET None GitHub OAuth callback
/oauth/token POST None Token exchange and refresh
/oauth/revoke POST Bearer Revoke a token
/mcp POST Bearer MCP protocol endpoint (OAuth or PAT)
/api/index POST HMAC Webhook for GitHub Actions to trigger re-indexing
/admin/repos GET Admin List repos with indexing status
/admin/repos/:name/reindex POST Admin Trigger repo re-index
/admin/cache/clear POST Admin Clear search response cache
/admin/analytics GET Admin Usage analytics summary
/admin/analytics/queries GET Admin Recent query log
/admin/sessions GET Admin List active OAuth sessions
/admin/sessions/:id DELETE Admin Revoke an OAuth session

Knowledge file format

Knowledge files are markdown with YAML frontmatter. Only domain and owner are required:

---
domain: compliance          # Required: knowledge area
owner: legal-team           # Required: who maintains this
tags: [ferpa, privacy]      # Optional: searchable tags
audience: [all]             # Optional: who this is for (default: all)
classification: internal    # Optional: public | internal | restricted (default: internal)
confidence: current         # Optional: established | current | evolving | exploratory
---

# Document Title

## Section One

Content here becomes a searchable chunk.

## Section Two

Each ## section is indexed separately with its own embedding.

See docs/guides/content-schema.md for the full specification.

Caddy setup

The dev environment uses a shared Caddy reverse proxy for HTTPS routing at *.localhost domains. If Caddy is not already set up on your machine:

  1. Create ~/caddy/docker-compose.yml:
services:
  caddy:
    image: lucaslorentz/caddy-docker-proxy:ci-alpine
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - caddy_data:/data
    networks:
      - web

networks:
  web:
    external: true

volumes:
  caddy_data:
  1. Create the shared network and start Caddy:
docker network create web 2>/dev/null || true
cd ~/caddy && docker compose up -d

After this, any Docker service with the right Caddy labels will be automatically accessible at https://<name>.localhost.

Production deployment (Railway)

The app is deployed to Railway at https://knowledge-network.up.railway.app.

Services

The Railway project ("knowledge-network") has two services:

  • knowledge-network — the Node.js app, built with Nixpacks from this repo
  • Postgres — managed PostgreSQL with pgvector extension

Environment variables on Railway

Variable Value Notes
DATABASE_URL (auto-injected by Railway) Points to the managed Postgres instance
PORT (auto-injected by Railway) Do not set manually
VOYAGE_API_KEY (secret) Voyage AI API key
GITHUB_TOKEN (secret) GitHub PAT with repo read access
WEBHOOK_SECRET (secret) Shared secret for webhook HMAC verification
MANIFEST_PATH seed/repos.yaml Path to the repo manifest
NODE_ENV production Enables SSL for database connections
GITHUB_CLIENT_ID (secret) GitHub OAuth App client ID (Phase 2)
GITHUB_CLIENT_SECRET (secret) GitHub OAuth App client secret (Phase 2)
TOKEN_ENCRYPTION_KEY (secret) 64-char hex key for token encryption (Phase 2)
OAUTH_ISSUER https://knowledge-network.up.railway.app Must match the public URL (Phase 2)
GITHUB_ORG Digication GitHub org for admin team checks (Phase 2)
ADMIN_GITHUB_TEAM knowledge-network-admins GitHub team slug for admin access (Phase 2)
LOG_LEVEL info Options: debug, info, warn, error (Phase 2)

Deploying

Railway is configured to deploy from the main branch. To deploy manually:

railway up

Or push to main if auto-deploy is connected via the Railway dashboard.

Railway configuration

Deployment is configured in railway.json:

  • Builder: Nixpacks (auto-detects Node.js/pnpm)
  • Build: runs pnpm build (TypeScript compile + copy schema.sql)
  • Start: node dist/index.js
  • Health check: polls /health
  • Restart policy: restarts on failure (up to 3 times)

Setting up a new Railway environment

  1. Install the Railway CLI: npm install -g @railway/cli
  2. Log in: railway login
  3. Initialize: railway init (from the project directory)
  4. Add PostgreSQL: railway add --database postgres
  5. Set environment variables:
    railway variables set VOYAGE_API_KEY=<key>
    railway variables set GITHUB_TOKEN=<token>
    railway variables set WEBHOOK_SECRET=<secret>
    railway variables set MANIFEST_PATH=seed/repos.yaml
    railway variables set NODE_ENV=production
    railway variables set GITHUB_CLIENT_ID=<oauth-client-id>
    railway variables set GITHUB_CLIENT_SECRET=<oauth-client-secret>
    railway variables set TOKEN_ENCRYPTION_KEY=<64-char-hex>
    railway variables set OAUTH_ISSUER=https://your-app.up.railway.app
    railway variables set GITHUB_ORG=<your-github-org>
    railway variables set ADMIN_GITHUB_TEAM=<team-slug>
    railway variables set LOG_LEVEL=info
  6. Set DATABASE_URL to the PostgreSQL connection string from Railway
  7. Deploy: railway up
  8. Generate a public domain: railway domain

CI/CD

Two GitHub Actions workflows are included:

Workflow Trigger What it does
.github/workflows/validate-content.yml PRs touching .md files Validates frontmatter against the schema
.github/workflows/index-on-merge.yml Called by knowledge repos on merge Sends changed files to the server for re-indexing

Setting up a knowledge repo for auto-indexing

In each knowledge repo that should be indexed:

  1. Add a workflow that calls index-on-merge.yml
  2. Set these GitHub Actions secrets:
    • KN_WEBHOOK_SECRET — same value as the server's WEBHOOK_SECRET
    • KN_MCP_SERVER_URL — the server URL (e.g., https://knowledge-network.up.railway.app)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 97.6%
  • JavaScript 2.4%