Skip to content

fix: Sanitize graph props and strip /workspace/ paths for remote MCP use#656

Open
bchrao wants to merge 4 commits intoCodeGraphContext:mainfrom
bchrao:fix/sanitize-graph-props-and-strip-workspace-paths
Open

fix: Sanitize graph props and strip /workspace/ paths for remote MCP use#656
bchrao wants to merge 4 commits intoCodeGraphContext:mainfrom
bchrao:fix/sanitize-graph-props-and-strip-workspace-paths

Conversation

@bchrao
Copy link
Copy Markdown
Contributor

@bchrao bchrao commented Feb 24, 2026

Summary

  • Strip /workspace/ prefix from tool result paths — When CGC runs as a remote MCP server (via mcp-proxy SSE in K8s), index_repository clones repos into /workspace/<repo> inside the container. All tool results now automatically strip this prefix so LLMs see repo-relative paths (e.g. myrepo/src/app.py). Handles both direct path keys and Cypher-aliased keys like f.path.

  • Sanitize non-primitive FalkorDB properties — Language parsers produce tuples (context in call nodes) and lists-of-dicts (detailed_args in C functions) which FalkorDB rejects. These are now serialized to JSON strings before SET n += $props. Fixes "Property values can only be of primitive types" error that caused C/C++ repo indexing to fail after a few files.

  • Update LLM system prompt for remote architecture — Rewrite the prompts.py system prompt to reflect that the MCP server runs as a remote container service, not locally. Add a "Remote Server Architecture" section explaining that index_repository clones into the server container, add_code_to_graph/watch_directory operate on the server's filesystem, and returned paths are relative. Add all 12 previously missing tools to the Tool Manifest table (e.g. index_repository, list_indexed_repositories, delete_repository, find_dead_code, load_bundle, search_registry_bundles, visualize_graph_query, etc.).

Files Changed

  • src/codegraphcontext/server.py — Added _strip_workspace_prefix recursive helper, applied in call_tool handler
  • src/codegraphcontext/tools/graph_builder.py — Added _sanitize_props helper, applied before session.run
  • src/codegraphcontext/prompts.py — Rewrote Role/Goal section for remote architecture, added all missing tools to Tool Manifest

Test plan

  • ./tests/run_tests.sh fast — all 32 tests pass
  • Deployed to K8s cluster as v0.2.8, verified via SSE:
    • list_indexed_repositories returns "path": "gokart-c" (not /workspace/gokart-c)
    • find_code returns repo-relative paths
    • execute_cypher_query with MATCH (f:File) RETURN f.path returns stripped paths
    • index_repository on a 1620-file C repo indexes with zero errors (previously failed at 3 files)
  • Verified updated prompt is served via MCP instructions field

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 24, 2026

@bchrao is attempting to deploy a commit to the shashankss1205's projects Team on Vercel.

A member of the Team first needs to authorize it.

@bchrao bchrao force-pushed the fix/sanitize-graph-props-and-strip-workspace-paths branch from 4d729b5 to b32a783 Compare March 2, 2026 16:30
Two fixes for running CGC as a remote MCP server in Kubernetes:

1. Strip /workspace/ prefix from all path values in tool results
   (server.py call_tool handler) so LLMs see repo-relative paths
   like `myrepo/src/app.py` instead of container-absolute paths.
   Handles both direct path keys and Cypher-aliased keys (e.g. f.path).

2. Sanitize non-primitive property values before storing in FalkorDB
   (graph_builder.py). Parser-produced tuples (context in call nodes
   across all language parsers) and lists-of-dicts (detailed_args in
   C parser) are serialized to JSON strings. Fixes "Property values
   can only be of primitive types" error that caused C/C++ repo
   indexing to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bchrao bchrao force-pushed the fix/sanitize-graph-props-and-strip-workspace-paths branch from b32a783 to 9d51b94 Compare March 24, 2026 15:46
@bchrao
Copy link
Copy Markdown
Contributor Author

bchrao commented Mar 24, 2026

The E2E test failure is unrelated to this PR's changes. The error is:

ModuleNotFoundError: No module named 'tree_sitter_c_sharp'

This occurs during cgc index in the CI runner's tree-sitter manager (tree_sitter_manager.py:133), not in any code touched by this PR. This PR only modifies server.py (workspace path stripping) and graph_builder.py (property sanitization).

The E2E workflow passes on main push events — this appears to be a dependency installation difference in the fork/PR CI environment.

bchrao and others added 3 commits March 25, 2026 13:50
The **params spread passes tuples (e.g. context) from language parsers
directly to Cypher, which FalkorDB rejects. Wrap with _sanitize_props().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite sanitizer to whitelist primitives instead of blacklisting
  known bad types, catching all non-primitive values
- Sanitize call_params dicts passed to _safe_run_create for CALLS edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The _sanitize_props helper was serializing rel_props (a dict) to a JSON
string, but FalkorDB's SET r += $rel_props requires a map parameter.
Exclude rel_props from sanitization and pass it as a raw dict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant