|
1 |
| -# CLAUDE.md |
2 |
| - |
3 |
| -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
4 |
| - |
5 |
| -## Project Overview |
6 |
| - |
7 |
| -PyGraphistry is a Python library for graph visualization, analytics, and AI with GPU acceleration capabilities. It's designed to work with graph data by: |
8 |
| - |
9 |
| -1. Loading and transforming data from various sources into graph structures |
10 |
| -2. Providing visualization tools with GPU acceleration |
11 |
| -3. Offering graph analytics and AI capabilities including querying, ML, and clustering |
12 |
| - |
13 |
| -The library follows a client-server model where: |
14 |
| -- The Python client prepares data and handles transformations like loading, wrangling, querying, ML, and AI |
15 |
| -- Visualization happens through Graphistry servers (cloud or self-hosted) |
16 |
| -- Most user interaction follows a functional programming style with immutable state |
17 |
| - |
18 |
| -## Architecture |
19 |
| - |
20 |
| -PyGraphistry has a modular architecture consisting of: |
21 |
| - |
22 |
| -1. Core visualization engine that connects to Graphistry servers |
23 |
| -2. GFQL (Graph Frame Query Language) for dataframe-native graph queries |
24 |
| -3. Integration with many databases and graph systems (Neo4j, Neptune, TigerGraph, etc.) |
25 |
| -4. GPU acceleration through RAPIDS integration |
26 |
| -5. AI/ML capabilities including UMAP embeddings and graph neural networks |
27 |
| - |
28 |
| -Most components follow functional-style programming where methods create new copies of objects with updated bindings rather than modifying state. |
29 |
| - |
30 |
| -## Development Commands |
31 |
| - |
32 |
| -### Containers |
33 |
| - |
34 |
| -PyGraphistry uses Docker for development and testing. The `docker` directory contains Dockerfiles and scripts for building and running tests in isolated environments. The bin/*.sh are unaware of the Docker context, so you should run from the docker folder, which calls the appropriate scripts. |
35 |
| - |
36 |
| -### Environment Setup |
37 |
| - |
38 |
| -```bash |
39 |
| -# Install PyGraphistry with development dependencies |
40 |
| -pip install -e .[dev] |
41 |
| - |
42 |
| -# For GPU-accelerated features |
43 |
| -pip install -e .[rapids] |
44 |
| - |
45 |
| -# For AI capabilities |
46 |
| -pip install -e .[ai] |
47 |
| - |
48 |
| -# For full development setup |
49 |
| -pip install -e .[dev,test,ai] |
50 |
| -``` |
51 |
| - |
52 |
| -### Testing Commands |
53 |
| - |
54 |
| -Testing is via containerized pytest, with shell scripts for convenient entry points: |
55 |
| - |
56 |
| -```bash |
57 |
| -# Run all tests |
58 |
| -./bin/test.sh |
59 |
| - |
60 |
| -# Run tests in parallel when many (xdist) |
61 |
| -./bin/test.sh -n auto |
62 |
| - |
63 |
| -# Run minimal tests (no external dependencies) |
64 |
| -./bin/test-minimal.sh |
65 |
| - |
66 |
| -# Run specific test file or test |
67 |
| -python -m pytest -vv graphistry/tests/test_file.py::TestClass::test_function |
68 |
| - |
69 |
| -# Run with Neo4j connectivity tests |
70 |
| -WITH_NEO4J=1 ./bin/test.sh |
71 |
| - |
72 |
| -# Docker-based testing (recommended for full testing) |
73 |
| -cd docker && ./test-cpu-local-minimal.sh |
74 |
| -cd docker && ./test-cpu-local.sh |
75 |
| -# For faster, targeted tests (WITH_BUILD=0 skips slow docs build) |
76 |
| -WITH_LINT=0 WITH_TYPECHECK=0 WITH_BUILD=0 ./test-cpu-local.sh graphistry/tests/test_file.py::TestClass::test_function |
77 |
| -# Ex: GFQL |
78 |
| -WITH_BUILD=0 ./test-cpu-local-minimal.sh graphistry/tests/test_compute_chain.py graphistry/tests/compute |
79 |
| -``` |
80 |
| - |
81 |
| -### Linting and Type Checking |
82 |
| - |
83 |
| -Run before testing: |
84 |
| - |
85 |
| -```bash |
86 |
| -# Lint the code |
87 |
| -./bin/lint.sh |
88 |
| - |
89 |
| -# Type check with mypy |
90 |
| -./bin/typecheck.sh |
91 |
| -``` |
92 |
| - |
93 |
| -### Building Documentation |
94 |
| - |
95 |
| -Sphinx-based: |
96 |
| - |
97 |
| -```bash |
98 |
| -# Build documentation locally |
99 |
| -cd docs && ./build.sh |
100 |
| -``` |
101 |
| - |
102 |
| -### GPU Testing |
103 |
| - |
104 |
| -```bash |
105 |
| -# For GPU functionality (if available) |
106 |
| -cd docker && ./test-gpu-local.sh |
107 |
| -``` |
108 |
| - |
109 |
| -## Common Development Workflows |
110 |
| - |
111 |
| -### Adding a New Feature |
112 |
| - |
113 |
| -1. Ensure you understand the functional programming style of PyGraphistry |
114 |
| -2. Create new features as standalone modules or methods where possible |
115 |
| -3. Implement it following the client-server model respecting immutable state |
116 |
| -4. Add appropriate tests in the `graphistry/tests/` directory |
117 |
| -5. Run linting and type checking before submitting changes |
118 |
| - |
119 |
| -### Testing Changes |
120 |
| - |
121 |
| -1. Use the appropriate test script for your feature: |
122 |
| - - `test-minimal.sh` for core functionality |
123 |
| - - `test-features.sh` for features functionality |
124 |
| - - `test-umap-learn-core.sh` for UMAP functionality |
125 |
| - - `test-dgl.sh` for graph neural network functionality |
126 |
| - - `test-embed.sh` for embedding functionality |
127 |
| - - Additional specialized tests exist for specific components |
128 |
| - |
129 |
| -2. For database connectors, ensure you have the relevant database running: |
130 |
| - - `WITH_NEO4J=1 ./bin/test.sh` for Neo4j tests |
131 |
| - |
132 |
| -### Building and Publishing |
133 |
| - |
134 |
| -1. Update the changelog in CHANGELOG.md |
135 |
| -2. Tag with semantic versioning: `git tag X.Y.Z && git push --tags` |
136 |
| -3. Confirm GitHub Actions publishes to PyPI |
137 |
| - |
138 |
| -### Dependencies |
139 |
| - |
140 |
| -* Dependencies are managed in `setup.py` |
141 |
| -* The `stubs` list in setup.py contains type stubs for development |
142 |
| -* Avoid adding unnecessary dependencies |
143 |
| -* If you encounter type checking errors related to missing imports: |
144 |
| - - First check if they're already defined in the `stubs` list in setup.py |
145 |
| - - If not, consider adding them to the ignore list in mypy.ini using format: |
146 |
| - ``` |
147 |
| - [mypy-package_name.*] |
148 |
| - ignore_missing_imports = True |
149 |
| - ``` |
150 |
| -
|
151 |
| -#### Dependency Structure |
152 |
| -
|
153 |
| -```python |
154 |
| -# Core dependencies - always installed |
155 |
| -core_requires = [ |
156 |
| - 'numpy', 'pandas', 'pyarrow', 'requests', ... |
157 |
| -] |
158 |
| -
|
159 |
| -# Type stubs for development |
160 |
| -stubs = [ |
161 |
| - 'pandas-stubs', 'types-requests', 'ipython', 'types-tqdm' |
162 |
| -] |
163 |
| -
|
164 |
| -# Optional dependencies by category |
165 |
| -base_extras_light = {...} # Light integrations (networkx, igraph, etc) |
166 |
| -base_extras_heavy = {...} # Heavy integrations (GPU, AI, etc) |
167 |
| -dev_extras = {...} # Development tools (docs, testing, etc) |
168 |
| -``` |
169 |
| - |
170 |
| -#### Docker Testing Dependencies |
171 |
| - |
172 |
| -* Docker tests install dependencies via `-e .[test,build]` or `-e .[dev]` |
173 |
| -* The PIP_DEPS environment variable controls which dependencies are installed |
174 |
| -* If adding new stubs, add them to the `stubs` list in setup.py |
175 |
| - |
176 |
| -## Project Dependencies |
177 |
| - |
178 |
| -PyGraphistry has different dependency sets depending on functionality: |
179 |
| - |
180 |
| -- Core: numpy, pandas, pyarrow, requests |
181 |
| -- Optional integrations: networkx, igraph, neo4j, gremlin, etc. |
182 |
| -- GPU acceleration: RAPIDS ecosystem (cudf, cugraph) |
183 |
| -- AI extensions: umap-learn, dgl, torch, sentence-transformers |
184 |
| - |
185 |
| -## Coding tips |
186 |
| - |
187 |
| -* We're version controlled: Avoid unnecessary rewrites to preserve history |
188 |
| -* Occasionally try lint & type checks when editing |
189 |
| -* Post-process: remove Claude's explanatory comments |
190 |
| - |
191 |
| -## Performance Guidelines |
192 |
| - |
193 |
| -### Functional & Immutable |
194 |
| -* Follow functional programming style - return new objects rather than modifying existing ones |
195 |
| -* No explicit `copy()` calls on DataFrames - pandas/cudf operations already return new objects |
196 |
| -* Chain operations to minimize intermediate objects |
197 |
| - |
198 |
| -### DataFrame Efficiency |
199 |
| -* Never call `str()` repeatedly on the same value - compute once and reuse |
200 |
| -* Use `assign()` instead of direct column assignment: `df = df.assign(**{col: val})` not `df[col] = val` |
201 |
| -* Select only needed columns: `df[['col1', 'col2']]` not `df` when processing large DataFrames |
202 |
| -* Use `concat` and `drop_duplicates` with `subset` parameter when combining DataFrames |
203 |
| -* Process collections at once (vectorized) rather than element by element |
204 |
| -* Use `logger.debug('msg %s', var)` not f-strings in loggers to skip interpolation costs when log level disabled |
205 |
| - |
206 |
| -### GFQL & Engine |
207 |
| -* Respect engine abstractions - use `df_concat`, `resolve_engine` etc. to support both pandas/cudf |
208 |
| -* Collection-oriented algorithms: Process entire node/edge collections at once |
209 |
| -* Be mindful of column name conflicts in graph operations |
210 |
| -* Reuse computed temporary columns to avoid unnecessary conversions |
211 |
| -* Consider memory implications during graph traversals |
212 |
| - |
213 |
| - |
214 |
| -## Git tips |
215 |
| - |
216 |
| -* Commits: We use conventional commits for commit messages, where each commit is a semantic change that can be understood in isolation, typically in the form of `type(scope): subject`. For example, `fix(graph): fix a bug in graph loading`. Try to isolate commits to one change at a time, and use the `--amend` flag to modify the last commit if you need to make changes before pushing. Changes should be atomic and self-contained, don't do too many things in one commit. |
217 |
| - |
218 |
| -* CHANGELOG.md: We use a changelog to track changes in the project. We use semvars as git tags, so while deveoping, put in the top (reverse-chronological) section of the changelog `## [Development]`. Organize changes into subsections like `### Feat`, `### Fixed`, `### Breaking 🔥`, etc.: reuse section names from the rest of the CHANGELOG.md. Be consistent in general. |
| 1 | +See [ai_code_notes/README.md](ai_code_notes/README.md) for AI assistant development guidance. |
0 commit comments