Skip to content

Commit 7179f7b

Browse files
authored
Fix/issue 660 logging handler (#661)
* fix(embed_utils): prevent global logging.StreamHandler.terminator modification Previously embed_utils.py was modifying the global logging.StreamHandler.terminator which affected ALL StreamHandlers across the entire Python process, breaking logging in other libraries. Now uses a custom _NoTerminatorHandler class that only affects the embed_utils logger without impacting global logging configuration. Fixes #660 * fix(feature_utils): handle missing encoder error properly Fixed _transform method that was returning None implicitly when encoder was not initialized. Now properly raises ValueError with clear message. Also fixed transform method to raise ValueError for invalid kind parameter instead of just logging and continuing. * fix(umap_utils): handle None values in transform_umap Properly handle cases where _y is None by creating empty DataFrame. Added assertions to ensure transform always returns non-None values as expected by type hints. * fix(text_utils): add type assertions for transform return values Add assertions to ensure transform() returns tuple type when return_graph=False, addressing mypy type checking issues. * chore(types): fix circular imports and add TYPE_CHECKING guards Add TYPE_CHECKING conditional imports to avoid circular dependencies in cluster.py, conditional.py, networks.py, outliers.py, and graphviz.py. Remove circular import from ModelDict.py. * revert(compute/collapse): remove unnecessary import changes Revert import changes that were not needed for fixing circular imports. * docs(changelog): add entry for logging handler fix Document fix for issue #660 where embed_utils.py was modifying global logging.StreamHandler.terminator. * chore(gitignore): add AI_PROGRESS directory Add AI_PROGRESS/ to gitignore for AI assistant working directories. * docs(claude): add conventional commits note and AI prompt templates Add note about using conventional commits for commit messages. Add comprehensive AI assistant prompt templates for development workflows: - Conventional commits template with safer git operations - Lint and type checking templates - Other development workflow templates * docs(changelog): update entries with commit hashes Add commit hashes to changelog entries for traceability. Correct description of logger changes (setup_logger utility, not TYPE_CHECKING). * chore(gitignore): add PLAN.md Add PLAN.md to gitignore for temporary AI planning files. * docs(changelog): format with proper GitHub links Add proper GitHub issue and commit links. Include PLAN.md in .gitignore entry. * docs(claude): simplify to point to ai_code_notes README Replace full guide content with single line pointing to the actual AI development guide location. * docs(changelog): add entry for CLAUDE.md simplification * fix(feature_utils): correct transform return type annotation The transform method always returns a tuple of DataFrames when return_graph=False. The second DataFrame may be empty but is never None. * docs(ai): update AI assistant documentation with Docker-first testing Emphasize containerized testing approach to avoid local environment setup issues. Updates include: - Add Docker quick start commands in README.md - Include containerized lint/typecheck commands in LINT_TYPES_CHECK.md - Clarify when direct script execution requires local environment - Add WITH_TEST=0 option for faster lint/typecheck only runs This helps AI assistants avoid common environment setup pitfalls and provides faster iteration cycles during development. * fix(ai_utils): handle empty DataFrames in infer_graph Add check for empty DataFrame before concatenation to prevent pandas errors when y is an empty DataFrame. The condition now checks both that y is not None and not empty before attempting to concatenate. This prevents runtime errors in graph inference when working with edge cases involving empty target DataFrames. * fix(feature_utils): add empty DataFrame checks in multiple functions Add defensive checks for empty DataFrames to prevent errors during feature processing: - features_without_target: Early return for empty y DataFrames - get_numeric_transformers: Check y is not empty before processing - process_dirty_dataframes: Verify y has data before encoding - FeatureMixin._featurize: Add empty check for cudf DataFrames These changes prevent AttributeError and concat errors when working with empty target DataFrames in feature engineering pipelines. * fix(umap_utils): prevent None errors with empty DataFrames Add defensive programming to handle empty DataFrames safely: - make_safe_umap_gpu_dataframes: Check for empty y before module check - _umap_fit_transform: Safe dtype logging when y is empty - _umap: Ensure y_safe is never None when passed to _infer_edges These changes prevent AttributeError when accessing properties of potentially empty DataFrames during UMAP embedding operations. * docs(changelog): update with recent commits Add entries for: - Empty DataFrame handling fixes across multiple modules - AI documentation updates for Docker-first testing Group related None/empty value handling fixes for better readability. * docs(changelg)
1 parent 3da7135 commit 7179f7b

19 files changed

+1373
-258
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,7 @@ demos/data/BIOGRID-IDENTIFIERS-3.3.123.tab.txt
9090
# local jupyter dev
9191
jupyter_dev/
9292
docs/source/demos
93+
94+
# AI assistant working directories
95+
AI_PROGRESS/
96+
PLAN.md

CHANGELOG.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,28 @@ All notable changes to the PyGraphistry are documented in this file. The PyGraph
55
The changelog format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
66
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and all PyGraphistry-specific breaking changes are explictly noted here.
77

8-
## [Development]
8+
## [0.37.0 - 2025-06-05]
9+
10+
### Fixed
11+
12+
* Fix embed_utils.py modifying global logging.StreamHandler.terminator ([#660](https://github.com/graphistry/pygraphistry/issues/660)) ([8480cd06](https://github.com/graphistry/pygraphistry/commit/8480cd06))
13+
14+
### Breaking 🔥
15+
16+
* `FeatureMixin.transform()` now raises `ValueError` for invalid `kind` parameter instead of silently continuing ([25e4bf51](https://github.com/graphistry/pygraphistry/commit/25e4bf51))
17+
* `FeatureMixin._transform()` now raises `ValueError` when encoder is not initialized instead of returning `None` ([25e4bf51](https://github.com/graphistry/pygraphistry/commit/25e4bf51))
18+
* `UMAPMixin.transform_umap()` now always returns `pd.DataFrame` (possibly empty) instead of `None` for `y_` in tuple return ([d2941ec4](https://github.com/graphistry/pygraphistry/commit/d2941ec4))
19+
20+
### Chore
21+
22+
* Switch to setup_logger utility in multiple modules ([842fb904](https://github.com/graphistry/pygraphistry/commit/842fb904))
23+
* Add AI_PROGRESS/ and PLAN.md to .gitignore ([f0c18b3b](https://github.com/graphistry/pygraphistry/commit/f0c18b3b), [ac25a356](https://github.com/graphistry/pygraphistry/commit/ac25a356))
24+
25+
### Docs
26+
27+
* Add AI assistant prompt templates and conventional commits guidance ([a52048a7](https://github.com/graphistry/pygraphistry/commit/a52048a7))
28+
* Simplify CLAUDE.md to point to ai_code_notes README ([e5393381](https://github.com/graphistry/pygraphistry/commit/e5393381))
29+
* Update AI assistant documentation with Docker-first testing ([db5496eb](https://github.com/graphistry/pygraphistry/commit/db5496eb))
930

1031
## [0.36.2 - 2025-05-16]
1132

CLAUDE.md

Lines changed: 1 addition & 218 deletions
Original file line numberDiff line numberDiff line change
@@ -1,218 +1 @@
1-
# CLAUDE.md
2-
3-
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4-
5-
## Project Overview
6-
7-
PyGraphistry is a Python library for graph visualization, analytics, and AI with GPU acceleration capabilities. It's designed to work with graph data by:
8-
9-
1. Loading and transforming data from various sources into graph structures
10-
2. Providing visualization tools with GPU acceleration
11-
3. Offering graph analytics and AI capabilities including querying, ML, and clustering
12-
13-
The library follows a client-server model where:
14-
- The Python client prepares data and handles transformations like loading, wrangling, querying, ML, and AI
15-
- Visualization happens through Graphistry servers (cloud or self-hosted)
16-
- Most user interaction follows a functional programming style with immutable state
17-
18-
## Architecture
19-
20-
PyGraphistry has a modular architecture consisting of:
21-
22-
1. Core visualization engine that connects to Graphistry servers
23-
2. GFQL (Graph Frame Query Language) for dataframe-native graph queries
24-
3. Integration with many databases and graph systems (Neo4j, Neptune, TigerGraph, etc.)
25-
4. GPU acceleration through RAPIDS integration
26-
5. AI/ML capabilities including UMAP embeddings and graph neural networks
27-
28-
Most components follow functional-style programming where methods create new copies of objects with updated bindings rather than modifying state.
29-
30-
## Development Commands
31-
32-
### Containers
33-
34-
PyGraphistry uses Docker for development and testing. The `docker` directory contains Dockerfiles and scripts for building and running tests in isolated environments. The bin/*.sh are unaware of the Docker context, so you should run from the docker folder, which calls the appropriate scripts.
35-
36-
### Environment Setup
37-
38-
```bash
39-
# Install PyGraphistry with development dependencies
40-
pip install -e .[dev]
41-
42-
# For GPU-accelerated features
43-
pip install -e .[rapids]
44-
45-
# For AI capabilities
46-
pip install -e .[ai]
47-
48-
# For full development setup
49-
pip install -e .[dev,test,ai]
50-
```
51-
52-
### Testing Commands
53-
54-
Testing is via containerized pytest, with shell scripts for convenient entry points:
55-
56-
```bash
57-
# Run all tests
58-
./bin/test.sh
59-
60-
# Run tests in parallel when many (xdist)
61-
./bin/test.sh -n auto
62-
63-
# Run minimal tests (no external dependencies)
64-
./bin/test-minimal.sh
65-
66-
# Run specific test file or test
67-
python -m pytest -vv graphistry/tests/test_file.py::TestClass::test_function
68-
69-
# Run with Neo4j connectivity tests
70-
WITH_NEO4J=1 ./bin/test.sh
71-
72-
# Docker-based testing (recommended for full testing)
73-
cd docker && ./test-cpu-local-minimal.sh
74-
cd docker && ./test-cpu-local.sh
75-
# For faster, targeted tests (WITH_BUILD=0 skips slow docs build)
76-
WITH_LINT=0 WITH_TYPECHECK=0 WITH_BUILD=0 ./test-cpu-local.sh graphistry/tests/test_file.py::TestClass::test_function
77-
# Ex: GFQL
78-
WITH_BUILD=0 ./test-cpu-local-minimal.sh graphistry/tests/test_compute_chain.py graphistry/tests/compute
79-
```
80-
81-
### Linting and Type Checking
82-
83-
Run before testing:
84-
85-
```bash
86-
# Lint the code
87-
./bin/lint.sh
88-
89-
# Type check with mypy
90-
./bin/typecheck.sh
91-
```
92-
93-
### Building Documentation
94-
95-
Sphinx-based:
96-
97-
```bash
98-
# Build documentation locally
99-
cd docs && ./build.sh
100-
```
101-
102-
### GPU Testing
103-
104-
```bash
105-
# For GPU functionality (if available)
106-
cd docker && ./test-gpu-local.sh
107-
```
108-
109-
## Common Development Workflows
110-
111-
### Adding a New Feature
112-
113-
1. Ensure you understand the functional programming style of PyGraphistry
114-
2. Create new features as standalone modules or methods where possible
115-
3. Implement it following the client-server model respecting immutable state
116-
4. Add appropriate tests in the `graphistry/tests/` directory
117-
5. Run linting and type checking before submitting changes
118-
119-
### Testing Changes
120-
121-
1. Use the appropriate test script for your feature:
122-
- `test-minimal.sh` for core functionality
123-
- `test-features.sh` for features functionality
124-
- `test-umap-learn-core.sh` for UMAP functionality
125-
- `test-dgl.sh` for graph neural network functionality
126-
- `test-embed.sh` for embedding functionality
127-
- Additional specialized tests exist for specific components
128-
129-
2. For database connectors, ensure you have the relevant database running:
130-
- `WITH_NEO4J=1 ./bin/test.sh` for Neo4j tests
131-
132-
### Building and Publishing
133-
134-
1. Update the changelog in CHANGELOG.md
135-
2. Tag with semantic versioning: `git tag X.Y.Z && git push --tags`
136-
3. Confirm GitHub Actions publishes to PyPI
137-
138-
### Dependencies
139-
140-
* Dependencies are managed in `setup.py`
141-
* The `stubs` list in setup.py contains type stubs for development
142-
* Avoid adding unnecessary dependencies
143-
* If you encounter type checking errors related to missing imports:
144-
- First check if they're already defined in the `stubs` list in setup.py
145-
- If not, consider adding them to the ignore list in mypy.ini using format:
146-
```
147-
[mypy-package_name.*]
148-
ignore_missing_imports = True
149-
```
150-
151-
#### Dependency Structure
152-
153-
```python
154-
# Core dependencies - always installed
155-
core_requires = [
156-
'numpy', 'pandas', 'pyarrow', 'requests', ...
157-
]
158-
159-
# Type stubs for development
160-
stubs = [
161-
'pandas-stubs', 'types-requests', 'ipython', 'types-tqdm'
162-
]
163-
164-
# Optional dependencies by category
165-
base_extras_light = {...} # Light integrations (networkx, igraph, etc)
166-
base_extras_heavy = {...} # Heavy integrations (GPU, AI, etc)
167-
dev_extras = {...} # Development tools (docs, testing, etc)
168-
```
169-
170-
#### Docker Testing Dependencies
171-
172-
* Docker tests install dependencies via `-e .[test,build]` or `-e .[dev]`
173-
* The PIP_DEPS environment variable controls which dependencies are installed
174-
* If adding new stubs, add them to the `stubs` list in setup.py
175-
176-
## Project Dependencies
177-
178-
PyGraphistry has different dependency sets depending on functionality:
179-
180-
- Core: numpy, pandas, pyarrow, requests
181-
- Optional integrations: networkx, igraph, neo4j, gremlin, etc.
182-
- GPU acceleration: RAPIDS ecosystem (cudf, cugraph)
183-
- AI extensions: umap-learn, dgl, torch, sentence-transformers
184-
185-
## Coding tips
186-
187-
* We're version controlled: Avoid unnecessary rewrites to preserve history
188-
* Occasionally try lint & type checks when editing
189-
* Post-process: remove Claude's explanatory comments
190-
191-
## Performance Guidelines
192-
193-
### Functional & Immutable
194-
* Follow functional programming style - return new objects rather than modifying existing ones
195-
* No explicit `copy()` calls on DataFrames - pandas/cudf operations already return new objects
196-
* Chain operations to minimize intermediate objects
197-
198-
### DataFrame Efficiency
199-
* Never call `str()` repeatedly on the same value - compute once and reuse
200-
* Use `assign()` instead of direct column assignment: `df = df.assign(**{col: val})` not `df[col] = val`
201-
* Select only needed columns: `df[['col1', 'col2']]` not `df` when processing large DataFrames
202-
* Use `concat` and `drop_duplicates` with `subset` parameter when combining DataFrames
203-
* Process collections at once (vectorized) rather than element by element
204-
* Use `logger.debug('msg %s', var)` not f-strings in loggers to skip interpolation costs when log level disabled
205-
206-
### GFQL & Engine
207-
* Respect engine abstractions - use `df_concat`, `resolve_engine` etc. to support both pandas/cudf
208-
* Collection-oriented algorithms: Process entire node/edge collections at once
209-
* Be mindful of column name conflicts in graph operations
210-
* Reuse computed temporary columns to avoid unnecessary conversions
211-
* Consider memory implications during graph traversals
212-
213-
214-
## Git tips
215-
216-
* Commits: We use conventional commits for commit messages, where each commit is a semantic change that can be understood in isolation, typically in the form of `type(scope): subject`. For example, `fix(graph): fix a bug in graph loading`. Try to isolate commits to one change at a time, and use the `--amend` flag to modify the last commit if you need to make changes before pushing. Changes should be atomic and self-contained, don't do too many things in one commit.
217-
218-
* CHANGELOG.md: We use a changelog to track changes in the project. We use semvars as git tags, so while deveoping, put in the top (reverse-chronological) section of the changelog `## [Development]`. Organize changes into subsections like `### Feat`, `### Fixed`, `### Breaking 🔥`, etc.: reuse section names from the rest of the CHANGELOG.md. Be consistent in general.
1+
See [ai_code_notes/README.md](ai_code_notes/README.md) for AI assistant development guidance.

0 commit comments

Comments
 (0)