Specialized documentation for AI assistants working on PyGraphistry. These guides supplement the main CLAUDE.md with detailed, topic-specific information.
Note: This directory is for developing PyGraphistry. For AI assistants using PyGraphistry, see graphistry-skills which provides skills for Claude Code, Cursor, Codex, etc.
- Functional Programming: Always return new objects, never modify in-place
- No
copy()on DataFrames: Operations already return new objects - Use
df.assign(): Never usedf[col] = valsyntax - Preserve Git History: Avoid unnecessary rewrites
- No Claude Comments: Remove explanatory comments before committing
# Before any work - establish baseline (containerized)
cd docker && WITH_BUILD=0 WITH_TEST=0 ./test-cpu-local.sh
# Quick Docker test (from docker/ directory)
WITH_BUILD=0 ./test-cpu-local-minimal.sh
# Run specific tests fast
WITH_LINT=0 WITH_TYPECHECK=0 WITH_BUILD=0 ./test-cpu-local.sh graphistry/tests/test_file.py
# GPU tests - FAST (reuse base image, no rebuild)
IMAGE="graphistry/graphistry-nvidia:${APP_BUILD_TAG:-latest}-${CUDA_SHORT_VERSION:-12.8}"
docker run --rm --gpus all -v "$(pwd)/graphistry:/opt/pygraphistry/graphistry:ro" \
$IMAGE pytest /opt/pygraphistry/graphistry/tests/test_file.py -v
# GPU tests - SLOW (full rebuild, use before merge)
cd docker && ./test-gpu-local.sh
# Validate RST documentation syntax
./docs/validate-docs.sh # All docs
./docs/validate-docs.sh docs/source/gfql/*.rst # Specific files
git diff --name-only HEAD -- '*.rst' | xargs ./docs/validate-docs.sh # Changed files
# Note: Direct script execution requires local environment setup
# ./bin/lint.sh && ./bin/mypy.sh && ./bin/pytest.sh- Never call
str()repeatedly on same value - Use vectorized operations, not loops
- Select only needed columns:
df[['col1', 'col2']] - Use
logger.debug('msg %s', var)not f-strings in loggers - Respect engine abstractions (
df_concat,resolve_engine)
CLAUDE.md # General guide (< 500 lines)
├── ai/ # Specialized guides
│ ├── README.md # This file - overview & quick ref
│ ├── docs/ # Documentation guides
│ │ ├── gfql/ # GFQL patterns & optimization
│ │ ├── gpu/ # GPU/RAPIDS best practices
│ │ └── connectors/ # Database-specific patterns
│ └── prompts/ # Reusable workflow templates
└── plans/ # Task tracking (gitignored)
- CLAUDE.md: Start here for general PyGraphistry development
- ai/: Load specific guides only when working on that topic
- plans/: Track multi-session work and complex implementations
- P0 🚨: Critical - Breaking functionality, must fix immediately
- P1 🔴: High - Type safety, imports, security issues
- P2 🟡: Medium - Code style consistency, best practices
- P3 🟢: Low - Minor improvements, nice-to-haves
- P4 ⚪: Minimal - Cosmetic, already suppressed
- P5 ⬜: Skip - Won't fix, intentional patterns
- ✅ Complete
- 🔄 In Progress
- 📝 Planned
- ❌ Blocked
- ⏭️ Skipped
ai/
├── docs/ # Documentation guides
│ ├── gfql/ # GFQL-specific patterns and guidelines
│ ├── gpu/ # GPU/CUDA development notes
│ └── connectors/ # Database connector patterns
└── prompts/ # Reusable workflow templates
├── PLAN.md # Task planning template with strict execution protocol
├── LINT_TYPES_CHECK.md # Code quality enforcement (with P0-P5)
├── CONVENTIONAL_COMMITS.md # Git commit workflow with PyGraphistry conventions
├── PYRE_ANALYSIS.md # Advanced code analysis with pyre-check
├── GFQL_LLM_GUIDE_MAINTENANCE.md # Process for maintaining GFQL JSON generation guide
├── HOISTIMPORTS.md # Import hoisting and organization patterns
├── DECOMMENT.md # Comment removal and cleanup guidance
├── IMPLEMENTATION_PLAN.md # [TODO] Feature implementation tracking
└── USER_TESTING_PLAYBOOK.md # [TODO] AI-driven testing workflows
- Start with CLAUDE.md for general PyGraphistry work
- Load specific guides only when working on that topic
- Don't load everything - each guide is self-contained
- Check file size - guides should be < 500 lines
- Documentation: Max 500 lines per file
- Code files: Max 300 lines ideal, 500 lines acceptable
- Functions: Max 50 lines per function
- Classes: Split large classes into mixins
When adding a new guide:
- Place in appropriate subdirectory
- Use descriptive names (e.g.,
neo4j_patterns.md) - Add header explaining when to use it
- Focus on patterns, not API details
- Include practical examples
- Add to directory structure above
- Query patterns and optimization
- Column naming conventions
- Performance considerations
- Engine abstraction patterns
- Load when: Working on graph queries, chain/hop operations
- RAPIDS integration patterns
- Memory management strategies
- CPU/GPU fallback handling
- cuDF vs pandas compatibility
- Load when: Implementing GPU features, optimizing performance
dgx-spark-testing.md: how to sync code and run GFQL/cuDF tests on the shared DGX-Spark GPU machine
- Database-specific patterns
- Connection management
- Error handling strategies
- Testing with databases
- Load when: Adding/fixing database integrations
- PLAN.md: Task planning template with strict execution protocol for multi-step work
- LINT_TYPES_CHECK.md: Code quality enforcement with P0-P5 priorities
- CONVENTIONAL_COMMITS.md: Git commit workflow following PyGraphistry conventions
- PYRE_ANALYSIS.md: Advanced code analysis with pyre-check for refactoring and type-aware searching
- IMPLEMENTATION_PLAN.md [TODO]: Systematic feature implementation
- USER_TESTING_PLAYBOOK.md [TODO]: AI-driven testing workflows
- Load when: Starting new tasks, creating commits, fixing code quality issues, planning complex work, refactoring code
cd docker
# Fast iteration - skip slow parts
WITH_BUILD=0 ./test-cpu-local-minimal.sh
# Only lint and typecheck (no tests or build)
WITH_BUILD=0 WITH_TEST=0 ./test-cpu-local.sh
# Full validation before commit
./test-cpu-local.sh
# GPU functionality
./test-gpu-local.sh
# Specific features
./test-umap-learn-core.sh # UMAP embeddings
./test-dgl.sh # Graph neural networks
./test-embed.sh # Embedding featuresDocker containers include: pytest, mypy, ruff (preinstalled)
# Reuse existing graphistry image (no rebuild)
IMAGE="graphistry/graphistry-nvidia:${APP_BUILD_TAG:-latest}-${CUDA_SHORT_VERSION:-12.8}"
docker run --rm --gpus all \
-v "$(pwd):/workspace:ro" \
-w /workspace -e PYTHONPATH=/workspace \
$IMAGE pytest graphistry/tests/test_file.py -vFast iteration: Use this during development
Full rebuild: Use ./docker/test-gpu-local.sh before merge
| Variable | Default | Purpose |
|---|---|---|
WITH_LINT |
1 | Run ruff linting |
WITH_TYPECHECK |
1 | Run mypy type checking |
WITH_BUILD |
0 | Build documentation |
WITH_NEO4J |
0 | Run Neo4j integration tests |
PYTHON_VERSION |
- | Override Python version |
Use Grep/Ripgrep for:
- Simple text/pattern search
- Quick file location
- Initial exploration
grep -r "dataset_id" graphistry/*.py
rg "_dataset_id" --type pyUse AST Scripts for:
- Custom pattern detection (e.g., "methods that modify attribute X")
- Fast iteration during development (< 1 second)
- When pyre is too slow or times out
python3 plans/task_name/analyze_*.pyUse Pyre for:
- Type-aware analysis and refactoring
- Find-all-references (call graph analysis)
- Finding all implementations of an interface
- Complex dependency chain analysis
- See prompts/PYRE_ANALYSIS.md for detailed guide
Recommendation: Start with grep for exploration, use AST scripts for custom analysis, use pyre only when you need type-aware refactoring or call graphs.
# ✅ Good - Functional style
df = df.assign(new_col=values)
df = df[df['col'] > 0] # Returns new DataFrame
# ❌ Bad - In-place modification
df['new_col'] = values
df.drop('col', inplace=True)# ✅ Good - Engine agnostic
from graphistry.compute.typing import DataFrameLike
engine = resolve_engine(df)
result = engine.df_concat([df1, df2])
# ❌ Bad - Engine specific
if isinstance(df, pd.DataFrame):
result = pd.concat([df1, df2])# ✅ Good - Clear types with imports
from typing import Optional, Union, TYPE_CHECKING
if TYPE_CHECKING:
import cudf
def process(df: Union[pd.DataFrame, 'cudf.DataFrame']) -> Optional[pd.DataFrame]:
return df if len(df) > 0 else None- Read CLAUDE.md for general context
- Run baseline:
./bin/lint.sh && ./bin/mypy.sh - Load specific guide if needed
- Follow functional programming patterns
- Add type annotations to new code
- Use appropriate priority (P0-P5) for issues
- Track complex work in plans/
- Run Docker tests:
cd docker && WITH_BUILD=0 ./test-cpu-local.sh - Update CHANGELOG.md under
## [Development]for user-visible changes:- Added: New features, predicates, call methods, API additions
- Fixed: Bug fixes, breaking changes resolved
- Changed: Behavior changes, deprecations
- Breaking 🔥: API changes that require user code updates
- Docs: Documentation improvements, examples, tutorials
- Infra: CI/CD, testing infrastructure, build system changes
- Security: Security fixes and improvements
- Perf: Performance improvements with benchmarks
- Include PR/issue numbers, examples, and impact descriptions
- Omit: Internal refactorings, test updates, type-only changes
- Use conventional commit:
fix(scope): description(seeprompts/CONVENTIONAL_COMMITS.md) - Remove debug code and Claude comments
For multi-session or complex work:
plans/task_name/
├── implementation_plan.md # Phases and approach
├── progress.md # Current status (update each session)
├── insights.md # Learnings and recommendations
└── [task-specific files] # Test results, benchmarks, etc.
add_gfql_cachingfix_gpu_memory_leakimplement_new_layoutoptimize_umap_performance