Capture, organize, and reuse knowledge from your data science experiments.
KMDS is an ontology-backed ecosystem for systematic knowledge management in data science and analytics workflows. It documents the incremental process of experimentation, data exploration, and model selection—capturing decisions, rationale, and repository schemas so that valuable insights are never lost over time.
Experimental work generates a fragmented stream of insights, local documentation, and Jupyter notebooks. This context is typically lost when a research trail goes cold. The KMDS ecosystem fixes this by providing a unified, structured approach to log, map, search, and visually audit your data engineering artifacts.
| User | How they interact with KMDS |
|---|---|
| Data scientist | Python API, local LLM integrations, notebooks, and CLI framework |
| Software developer | Automated repo mapping utilities and pipeline automated logging hooks |
| Business analyst | Interactive UI Workbench Dashboard and plain-English natural language ingestion |
🎥 Watch a quick overview of KMDS: YouTube Video
- Interactive UI Workbench (
kmds-ui): View, edit, and safely serialize knowledge graphs with special handling for long text notes and file context preservation. - Automated Repository Scanning (
kmds-data-helper): Parse local codebases using a multi-persona engine (Data Scientist, Tech Lead, Architect) to synthesize documentation and code into structured knowledge graphs. - Natural Language Ingestion: Describe insights in plain English for automatic logging to your ontology graph.
- Semantic Vector Search: Build high-performance local vector indices for querying analytical findings.
- LLM Search Orchestration: Use Ollama-powered, intelligent routing for complex knowledge queries.
- Enterprise Ready: KMDS is meant to be used within a git repository. It inherits the security context of the repository it is used with. Please see this document
The kmds-data-helper package introduces a multi-persona analysis framework for existing data science repositories. Using local LLMs (via Ollama), it scans documentation, schemas, and notebooks to output complete KMDS knowledge graphs.
- Toggleable Role Personas: Switch between Data Scientist, Tech Lead, and Architect behaviors via a
kmds_config.yamlfile. - Automated Artifact Synthesis: Scans directories to auto-generate structured diagnostic files (
full_service_report.json,kmds_summary.json). - Direct Graph Production: Compiles generated report structures directly into a standardized
project_knowledge_graph.xml.
The kmds-ui extension package provides a specialized web dashboard custom-engineered to view, audit, and modify knowledge graph files generated by the KMDS ecosystem. It prevents namespace prefix corruption or structural layout degradation common in general-purpose ontology utilities (like Protégé).
- Prefix-Agnostic Processing: Splits and parses XML fragments dynamically at runtime to handle KMDS files lacking explicit namespace declarations without crashing.
- Proportional Narrative Isolation: Uses a 75% proportional grid with dynamic word-wrapping to cleanly display long text fields without text truncation.
- Preserved File Context: Automatically tracks the original file name during ingestion, ensuring the updated graph downloads with matching name signatures.
Install the entire modular framework directly from PyPI:
# Install core logging, UI, and data helper
pip install kmds kmds-ui kmds-data-helperLaunch your workbench application from the terminal:
kmds-workbenchOpen http://127.0.0 in your browser.
Set up a project directory containing documents/, notebooks/, and data_dictionary/.
Run the automatic aggregator tool:
kmds-kb --workspace . --project-file project_knowledge_graph.xml --mode autoTo parse individual report paths directly, use the adapter interface command:
kmds-analyze --input output/full_service_report.json --project-file project_knowledge_graph.xml --create-project --workflow-name kmds_project_workflow --mode autokmds-summary-log \
--summary "Daily reporting workflow for support operations." \
--workflow-name "support_reporting" \
--workflow-type application \
--project-file ./support_reporting.xml \
--create-project --no-promptkmds-search \
--kb ./project_knowledge_graph.xml \
--query "What data quality issues were identified?" \
--n-results 3The full documentation covers custom LLM functions, available routing templates, and output formats.
This repository includes two detailed examples:
- Analytics Example: Evaluates the effectiveness of a ticket resolution help desk.
- Machine Learning Example: Uses Principal Component Analysis (PCA) to summarize online store sales activity.