Skip to content

Latest commit

 

History

History
1961 lines (1645 loc) · 130 KB

File metadata and controls

1961 lines (1645 loc) · 130 KB

[0.6.7-dev] - 2026-05-06

🚀 Features

  • DeltaGraph: Databricks SQL Warehouse support (PRs #316–#338): A second SQL dialect alongside ClickHouse, plumbed through the existing renderer and executor stack so Cypher can target Databricks/Spark without forking the engine. Phased rollout:

    • Phase 1.0–1.5 (#316–#326) — SqlDialect enum, FunctionMapper trait, dialect-aware function registry. ClickHouse spellings (groupArray, toInt64, toFloat64, toString, toUInt16, Array(Int64) casts) routed through the mapper so Spark equivalents (collect_list, bigint, double, string, int, ARRAY<BIGINT>) drop in without touching call sites.
    • Phase 1.6–1.7 (#327, #328) — VLP / BFS shortestPath dialect routing; cast_uint16 widened to int to be safe for the unbounded CLICKGRAPH_VLP_MAX_HOPS ceiling.
    • Phase 2.1–2.2 (#329–#331) — DatabricksSqlExecutor over reqwest using the Statement Execution API (submit / poll / INLINE JSON_ARRAY). PAT bearer auth with Debug redaction ("********"). Database::new_databricks(schema, DatabricksConfig) + set_current_dialect plumbing across query_to_sql, query_with_executor_async, query_graph_async. 3 wiremock-based e2e tests in clickgraph-embedded.
    • Phase 4.2 (#332) — cg --dialect [clickhouse|databricks] global flag (+ CG_DIALECT env, dialect = "…" config.toml key). Database::sql_only_with_dialect constructor narrowed to validated dialects (rejects unimplemented PostgreSQL/DuckDB/MySQL/SQLite with EmbeddedError::Validation). 7 integration tests.
    • Phase 4.4 (#333) — clickgraph-ffi databricks cargo feature exposes DatabricksConfig + Database::open_databricks to Go / Python (and any future UniFFI consumer). Default builds unchanged; distributions targeting Databricks build the cdylib with --features databricks before regenerating language bindings.
    • Phase 4.2 follow-up (#335) — cg query --dialect databricks actually executes (the original Phase 4.2 only emitted SQL). Reads DATABRICKS_HOST / _WAREHOUSE_ID / _TOKEN (and CG_DATABRICKS_* overrides + [databricks] config.toml section); PAT is env-only, never a CLI flag (would leak via ps / shell history). 3 wiremock-backed integration tests covering happy path, missing-credentials error, and dialect routing.
    • Phase 4.1 (#336) — Dedicated deltagraph server binary (HTTP + Bolt, defaults to Databricks dialect, Neo4j compat on by default with --disable-neo4j-compat opt-out). Ships under cargo build --features databricks --bin deltagraph. DATABRICKS_BASE_URL accepts https://* unconditionally and http:// only for loopback (localhost / 127.0.0.1) with a log::warn! on any override — keeps the prod path TLS-only while still allowing wiremock-backed e2e tests. 4 url-validation unit tests + 2 assert_cmd smoke tests.
    • Phase 4.3 (#337) — Bolt e2e boot test that spawns the real deltagraph binary against wiremock, allocates a free port dynamically, completes the Bolt v5 handshake (offering 5.8 / 5.7 / 5.6 / 4.4), and reaps the child via a Drop guard that spawns the wait task on the runtime. Proves the server boots end-to-end without needing a live Databricks. Plus a docs/deltagraph/QUICKSTART.md walkthrough (force-added past the docs/ gitignore).
    • Phase 3 (#338) — cg schema discover --dialect databricks introspects a catalog/schema via SHOW TABLES IN catalog.schema + DESCRIBE TABLE EXTENDED + a per-table SELECT capped at 32 columns. DatabricksProbe::introspect returns the same IntrospectResponse shape the ClickHouse path produces, so the LLM-prompt and /schemas/draft consumers work unchanged. New --catalog flag (also DATABRICKS_CATALOG / CG_DATABRICKS_CATALOG); identifiers are restricted to ASCII alphanumeric/underscore so the backticked SQL is injection-safe. 3 probe tests (2 unit + 1 wiremock) + 1 cg integration test.
    • Phase 3.2 — Optional top-level catalog: field on GraphSchemaConfig (YAML). When set, it supplies the Unity Catalog as a default for Database::new_databricks and cg schema discover --dialect databricks. Existing env/CLI sources still win, so per-environment overrides keep working — full precedence: --catalog flag > DATABRICKS_CATALOG / CG_DATABRICKS_CATALOG env > config.toml > YAML catalog:. Ignored under --dialect clickhouse. 1 roundtrip unit test + 2 embedded wiremock tests (YAML wins / caller wins) + 1 cg integration test pinning the catalog name into the SHOW TABLES IN SQL crossing the wire.

    Still pending before a Databricks GA: MERGE, full LDBC validation against a live warehouse, OAuth M2M auth, external-link result chunks. Plan: docs/design/DELTAGRAPH_PLAN.md.

  • Cypher writes in embedded mode (PRs #275–#286): CREATE, SET, DELETE / DETACH DELETE, and REMOVE against ClickGraph-managed nodes, translated to ClickHouse lightweight INSERT / UPDATE / DELETE. Server / remote / sql_only modes reject writes upstream via the new write_guard admission check; source:-backed nodes/edges remain read-only. Phased rollout:

    • Phase 0 (#275, #276) — design lock; chose lightweight UPDATE over rewrite for SET.
    • Phase 1 (#277) — Create / Update / Delete LogicalPlan variants + builder + write_guard.
    • Phase 2 (#278) — WriteRenderPlan + write_to_sql + id_gen (per-node id_generation schema attribute: uuid default / provided / snowflake).
    • Phase 3 (#279) — executor wiring; writable tables get enable_block_number_column = 1, enable_block_offset_column = 1 in DDL automatically.
    • Phase 4 (#280) — TCK write feature files imported (Create*, Set*, Delete*, Remove* — 21 files / 205 scenarios) and write-clause docs.
    • Phase 5a (#281) — TCK side-effect step + @unsupported-label-mutation skip tag.
    • Phase 5b (#282) — anonymous-node CREATE (CREATE (), CREATE (n {…})) routes to __Unlabeled table catalogued by schema_gen.rs; lifts Create1.feature file-level @wip.
    • Phase 5c (#283) — lifts Delete1.feature file-level @wip; ungates scenario [3]; per-scenario triage notes.
    • Phase 5d (#284) — write+RETURN side-channel via QueryResult::get_write_counters(); accurate Neo4j-compatible counters (nodes_created, properties_set, nodes_deleted, relationships_deleted).
    • Phase 5e (#285) — untyped-MATCH+write fan-out: TypeInference lifts a write root, runs label inference on the inner read pipeline so MATCH (n) DELETE n expands to Delete { input: Union[GN(n, A), GN(n, B), …] }; the write-plan builder fans out one DELETE / UPDATE per resolved label.
    • FFI exposure (#286): write_counters side-channel reachable from FFI + Python bindings.

    MERGE, relationship CREATE, CREATE … RETURN (MATCH-bound), edge-alias DELETE r, SET a += {…} map-merge, and REMOVE a:Label are not implemented yet — gated @wip in the TCK with per-scenario reasons. See clickgraph-tck/README.md for the live status.

  • TCK 100% (PR #273): All 402 read scenarios passing — fixed list indexing (negative indices + out-of-range → null), type validation, step-regex parsing, and DELETE detection regressions.

  • Default HTTP port 7475 (#310): Changed from 8080 to 7475 to align with Neo4j Browser conventions (Neo4j HTTP is 7474, ClickGraph sits one above) and avoid conflict with the common dev port. CLI / env override unchanged.

🧹 Infrastructure

  • CI zero-warning enforcement (#307): cargo clippy --all-targets -- -D warnings and cargo fmt --all -- --check now run as required CI checks, locking in the zero-warning state achieved by the cleanup arc (#287–#306).

  • Per-crate clippy cleanup (#305, #306): Cleared the remaining warnings in clickgraph-embedded (9 sites) and clickgraph-tool so every workspace crate now lints clean.

  • Workspace MSRV alignment (#308): Bumped clickgraph-client's MSRV from 1.70 → 1.85 to match the workspace; removes the only outlier and lets rust-toolchain be a single value.

  • Test-tree cleanup (#309): Deleted 5 orphan test directories and 5 unused deprecated items flagged by clippy/dead-code analysis.

  • Clippy zero-warnings (PRs #287–#302, 16 PRs): The cargo clippy --all-targets count went from 68 to 0. Highlights:

    • Dropped the pattern_resolver module (#287): superseded by TypeInference since the unified type-inference pass landed; ~1,400 lines of dead code removed.
    • large_enum_variant (#298): boxed CypherStatement::Query.query so the enum drops from 728 bytes → ~80 bytes (every value of the enum was previously paying the largest variant's cost). Auto-deref coercion absorbed most call sites — final diff was 6 files / 28 lines.
    • module_inception (#295): unwrapped 4 redundant inner #[cfg(test)] mod foo { … } blocks where the file was already declared as the same-named module in its parent.
    • type_complexity (#297): factored 5 wide tuple types into named aliases (ArgTransform, ResolvedTriple/PatternCombination, InferredPatternTypes, PathAliasesWithIds, FlattenedMapLiteralResult) — placed adjacent to first use and named for domain meaning.
    • only_used_in_recursion (#296): documented 14 sites where a parameter (plan_ctx, captured_cte_refs, etc.) is forwarded through recursion to maintain analyzer/optimizer Pass-trait signature symmetry.
    • too_many_arguments (#299, #300, #301, #302): triaged 23 sites — deleted 6 dead helpers (relationship_with_input, new_denormalized, Cte::new_vlp_with_columns, expand_fixed_length_joins, from_graph_rel_dyn, generate_and_store_pattern_combinations); documented 13 legit-but-wide functions with rationale; allowed 4 test-only fixtures.
    • Dead-code triage (#288, #293): set_role, encode_json_value, extract_target_id_negation, filter_node_schemas retained with #[allow(dead_code)] and rationale; non-reachable enum variant + 1 unused field deleted.
    • Mechanical idiomatic fixes across (#289–#294): unnecessary_unwrapif let, contains_key + insertentry().or_insert_with, from_str inherent → impl FromStr, &mut borrow correctness, etc.

🐛 Bug Fixes

  • Neo4j Browser 5.x compatibility (#312): Browser 5.x's connect flow now lands cleanly. (a) CALL dbms.components() is intercepted in the Bolt handler and answered with the canonical (name, versions, edition) shape — without it Browser shows "Failed to check Neo4j version. Invalid version: ". (b) Browser's bundled count(n) UNION ALL count(r) and db.labels() / db.relationshipTypes() / db.propertyKeys() queries are short-circuited (the SQL generator can't UNION-ALL disjoint count projections). (c) ~12 read-only SHOW commands Browser issues to populate sidebars (SHOW INDEXES, SHOW CONSTRAINTS, SHOW PROCEDURES, SHOW FUNCTIONS, …) are stubbed with the canonical Neo4j 5.x field schema and zero rows. (d) Iterative click-to-expand is stable across canvas growth: relationship element_id generation is now unified across all code paths (canonical Type:from->to- form), so the same logical edge dedupes correctly across expansions from either endpoint. (e) Same-label directed relationships (e.g., FOLLOWS:User→User) carry schema-natural FK direction via new r_from_id/r_to_id projections from the multi-type VLP CTE, so expanding from either end produces an identical element_id. (f) Browser-compat trailing - sentinel on element_ids switches Browser into elementId-mode, preventing the legacy parseInt → NaN → 0 collapse where every node ended up with id 0.

  • shortestPath / allShortestPaths + COUNT regression (#311): Several path-aggregation queries that regressed during the write-path landing are restored; test-stack glue cleanup.

  • Browser expand regression (#268): three root-cause fixes for Neo4j Browser node-expand path.

  • FFI memory cap (#271): max_memory_usage_bytes exposed in FFI SystemConfig.

  • TCK list & equality semantics (#273): list indexing (l[i] / l[-i] / out-of-range), type validation, regex-based step parsing, DELETE detection.

🔒 Security

  • rustls-webpki 0.103.10 → 0.103.13 (#274) — RUSTSEC-2026-0098 / 0099 / 0104.
  • Audit dispositioningcargo audit is clean (0 vulnerabilities). Three transitive-only unsound/unmaintained warnings are explicitly dispositioned in deny.toml:
    • RUSTSEC-2025-0134 (rustls-pemfile unmaintained): pulled by chdb-rust → reqwest 0.11; functioning code path, upstream upgrade pending.
    • RUSTSEC-2026-0097 (rand 0.8.5 / 0.9.2 unsound): only triggers when a custom global logger calls rand::rng() during init. ClickGraph does not install such a logger, so the unsound pattern cannot be reached. Transitive via tungstenite 0.21 and quinn-proto.
  • cargo deny and cargo audit continue to gate every PR via CI (workflow added in #178).

📚 Documentation

  • Embedded-writes design plan (#275, #276): docs/design/embedded-writes.md with phase decomposition, ID-generation strategy, and counter design.
  • motivation.md (#272): project motivation document.
  • rustdoc broken-link sweep (#304): fixed all 113 --no-deps rustdoc warnings; cargo doc is now warning-clean.

[0.6.6-dev] - 2026-04-03

🚀 Features

  • cg CLI tool (clickgraph-tool crate): Agent/script-oriented CLI for Cypher translation and execution without a running server. Commands: cg sql (Cypher→SQL), cg validate (parse + plan check), cg query (execute via remote ClickHouse), cg nl (NL→Cypher via LLM), cg schema show/validate/discover/diff. Config via ~/.config/cg/config.toml. Supports Anthropic (default) and any OpenAI-compatible API.

  • embedded feature now opt-in in clickgraph-embedded: chdb is no longer compiled by default. New Database::new_remote(schema, RemoteConfig) constructor executes Cypher against external ClickHouse with no chdb dependency — the backend used by cg query. Database::sql_only(schema) and Connection::query_to_sql() are always available for translation-only use.

  • Agent skills (skills/): Three publishable agent skills for Claude Code, LangChain, AutoGen, CrewAI, and OpenAI function calling — /cypher (NL→Cypher→SQL→execute), /graph-schema (show + validate schema), /schema-discover (generate schema YAML from ClickHouse via LLM). See skills/README.md for installation across frameworks.

  • openCypher TCK runner (clickgraph-tck/): Cucumber-based compatibility test suite running 402 openCypher TCK scenarios in embedded (chdb) mode. Results: 383/402 passed (95.3%), 0 failures, 19 skipped. The 19 skipped scenarios cover Cypher write clauses (CREATE, SET, DELETE, MERGE) — not yet supported as Cypher syntax; programmatic write API (create_node(), create_edge(), upsert_node()) is already available in embedded mode. Enabled with CLICKGRAPH_CHDB_TESTS=1 cargo test -p clickgraph-tck --test tck.

🐛 Bug Fixes

  • Debug println removed: Eliminated leftover println!("DEBUG TryFrom RenderExpr: ...") in render_plan/render_expr.rs that was polluting stdout during query translation.

[0.6.5-dev] - 2026-03-29

🚀 Features

  • Hybrid remote query + local storage (PR #240): Execute Cypher queries against a remote ClickHouse cluster from embedded mode, then store results locally in chdb as a subgraph for fast re-querying. New RemoteConfig for SystemConfig, plus Connection methods: query_remote(), query_remote_graph(), query_graph(), store_subgraph(). New GraphResult structured output and StoreStats return type. Available in Rust, Python (UniFFI), and Go (UniFFI) bindings.

  • Embedded write API (PR #236): create_node(), create_edge(), upsert_node(), upsert_edge() with batch variants (create_nodes(), create_edges()). delete_nodes(), delete_edges() for cleanup. import_json() and import_json_file() for bulk JSON import. Schema entries without source: get auto-created as ReplacingMergeTree tables. property_types field for type-aware DDL (PR #238).

  • Multi-format file import (PR #243): import_csv_file(), import_parquet_file(), import_file() (auto-detect from extension). Supports CSV, Parquet, TSV, JSON/NDJSON/JSONL formats.

  • Richer Value types (PR #244): Value::Date("YYYY-MM-DD"), Value::Timestamp("YYYY-MM-DD HH:MM:SS"), Value::UUID("8-4-4-4-12") auto-detected from ClickHouse JSON output. to_sql_literal() generates toDate()/toDateTime()/toUUID() wrappers. Value::string() constructor bypasses detection.

  • Kuzu API parity (PR #242): Value::as_bool(), query timing (get_compiling_time()/get_execution_time()), Database::in_memory(), Connection::set_query_timeout(), QueryResult::get_column_data_types().

  • DataFrame output (PR #245): Python QueryResult.get_as_df() (Pandas), get_as_arrow() (PyArrow), get_as_pl() (Polars) with lazy imports.

  • Python wrapper improvements (PR #246): result.compiling_time/execution_time/column_data_types properties. conn.create_node()/create_edge()/create_nodes()/import_file()/execute_sql() accept plain Python dicts with auto-conversion to FFI Value types.

🐛 Bug Fixes (from TCK work)

  • Cypher three-valued equality: Added cypher_literal_eq() in SQL generator implementing Cypher's null-propagating equality — null = anything → null, cross-type comparisons → false, list element-wise null propagation. Fixes 8 comparison test failures. (to_sql_query.rs)

  • VLP chained-pattern start labels: Multi-hop patterns like MATCH (n)-->(a)-->(b) RETURN b now correctly derive start labels for the second hop by recursing into the chained inner GraphRel. Supplements __Unlabeled start labels with schema from_node types for chained patterns. Fixes empty results on 2-hop traversals with labeled data. (cte_extraction.rs)

  • List-of-lists comparison: Extended is_literal_like() to recognise pure-literal nested lists, enabling native ClickHouse Array(Array(T)) comparison (element-by-element, matching Cypher's [2,1] > [2] semantics). Removed unnecessary has_type_mismatch helpers; all-literal arrays now render as-is. (render_expr.rs)

  • Type inference performance regression: Reverted max_combos from MAX_RAW_COMBINATIONS (200,000) to get_max_combinations() (500) — the raw-cap constant was accidentally used where the post-filter limit should be, causing 400× overhead in pattern combination generation. (type_inference.rs)

📚 Documentation

  • Tutorials and examples (PR #246): 5 runnable Python scripts (examples/embedded/) covering quick start, DataFrames, write API, GraphRAG hybrid workflow, and export formats. Wiki tutorial page (docs/wiki/Embedded-Tutorials.md) with Python + Rust code, architecture diagrams, and API quick reference.

🐛 Other Bug Fixes

  • Edge extraction fallback (PR #241): extract_edge_from_row falls back to from_id/to_id aliases when schema FK column names don't match SQL-generated column names.
  • Security dep updates: lz4_flex 0.11.5→0.11.6 (RUSTSEC-2026-0041), rustls-webpki 0.103.8→0.103.10 (RUSTSEC-2026-0049).

🧹 Infrastructure

  • CI: cargo audit ignores unmaintained rustls-pemfile warning (transitive dep via chdb-rust).

[0.6.4-dev] - 2026-03-14

🚀 Features

  • Denormalized & coupled schema support: Full query support for schemas where node properties are embedded in edge tables via from_node_properties/to_node_properties. Includes property mapping, ORDER BY resolution, UNION aggregate column rewriting, and id() on virtual nodes (PRs #224-#228).

  • OPTIONAL MATCH on denormalized schemas: New CTE + LEFT JOIN architecture for correct LEFT JOIN semantics when MATCH produces a UNION standalone node scan. Includes UnionDistribution skip for optional patterns, column reference rewriting, and join preservation through the optimizer (PRs #229-#230).

  • VLP on denormalized/polymorphic schemas: Fixed exact-length VLP cycle prevention for virtual nodes (no separate table), enabling *2, *3 patterns. Range VLP (*1..3), path variables, and shortestPath all work on denormalized schemas (PR #231).

  • Cross-schema pattern matrix tests: Comprehensive test suite covering 15 query patterns across 5 schema types (standard, FK-edge, denormalized, polymorphic, coupled). 151 tests passing, 0 xfails (PRs #226-#232).

🐛 Bug Fixes

  • Denormalized property mapping: get_properties_with_table_alias() resolves node properties through edge table's from_node_properties/to_node_properties with direction awareness (PR #225).
  • id(node) on denormalized nodes: SelectBuilder Case 5 now resolves through edge alias and mapped column instead of using the virtual node alias directly (PR #227).
  • UNION branch Column qualification: Bare Column("OriginCityName") expressions from denormalized ViewScans converted to PropertyAccessExp with correct alias in GraphNode handler (PR #228).
  • VLP cycle prevention: Moved extract_table_name calls inside non-denormalized branch — denormalized patterns use from_id/to_id directly (PR #231).
  • UnionDistribution: Skip distributing optional GraphRel over denormalized Union to preserve LEFT JOIN semantics (PR #229).
  • is_node_denormalized: Now handles Union of denormalized GraphNodes (PR #229).

🧹 Infrastructure

  • jemalloc memory allocator: Reduces memory fragmentation for long-running server workloads (PR #213).
  • Plan explosion guard: Prevents combinatorial blowup in multi-type VLP expansion (PR #212).
  • Test cleanup: ~103 stale xfail markers removed, 25 invalid test queries converted to skips (PRs #211, #218-#223, #227, #232).

[0.6.3-dev] - 2026-03-05

🚀 Features

  • APOC Export Procedures: Neo4j-compatible CALL apoc.export.{csv|json|parquet}.query(cypher, destination, config) for exporting query results. Supports local files, S3, GCS, Azure, and HTTP destinations. Works in HTTP server, Bolt protocol, and embedded mode.

    • Destination resolver: Maps URI schemes to ClickHouse INSERT INTO FUNCTION table functions (file(), s3(), url(), azureBlobStorage())
    • Parser fix: Standalone CALL with positional args now correctly parsed even when inner Cypher contains RETURN/UNION keywords
    • Config: Parquet compression codecs (snappy, gzip, lz4, zstd, brotli)
  • Embedded mode (PR #179): Run Cypher graph queries entirely in-process via chdb — no external ClickHouse server required. Supports Parquet, CSV, Iceberg, Delta Lake, and S3-compatible storage.

    • QueryExecutor trait: Abstracts SQL execution; RemoteClickHouseExecutor (existing) and ChdbExecutor (new) are the two backends. Default behaviour is unchanged.
    • clickgraph-embedded crate: Kuzu-compatible Rust library API — Database::new(schema, config), Connection::new(&db), conn.query(cypher), result.next()Row.
    • source: schema field: Optional per-node/relationship URI pointing to the data file. At startup, ClickGraph creates chdb VIEWs named after the schema table: field so existing SQL generation requires no changes.
    • URI schemes: file://, s3://, gs://, iceberg+s3://, iceberg+local://, delta+s3://, table_function:<raw>.
    • StorageCredentials: S3/GCS/Azure credentials applied as chdb SET commands at session init; falls back to environment variables and instance-profile credentials automatically.
    • Server embedded flag: --embedded CLI flag / CLICKGRAPH_EMBEDDED=true env var; HTTP and Bolt endpoints work as normal.
    • Tests: 9 source_resolver tests, 8 credential tests, 17 embedded unit tests, 10 e2e integration tests.
    • Docs: Embedded Mode wiki page

🚀 Features

  • LDBC SNB benchmark: 14/37 → 36/37 (97%) — 22 queries promoted from adapted to official Cypher. The only remaining gap is bi-16 (CALL subquery, a known language feature gap).

    • Official queries promoted: complex-3, complex-5, complex-7, complex-10, complex-12, complex-13, bi-3, bi-8, bi-14, and others
    • Adapted queries remaining: bi-17 (multi-VLP), complex-14 (weighted shortest path via cost(path))
  • GraphRAG structured output (format: "Graph") (PR #165): Query results returned as graph-structured JSON with nodes, edges, and properties — enables direct consumption by graph visualization and RAG pipelines.

  • ClickHouse cluster load balancing (CLICKHOUSE_CLUSTER env var) (PR #164): Distributes queries across ClickHouse cluster nodes for horizontal read scaling.

  • apoc.meta.schema() for MCP server compatibility (PR #163): Implements the Neo4j APOC procedure that MCP servers and graph tools use for schema introspection.

  • LLM-powered schema discovery (:discover command) (PR #146): Server formats a discovery prompt (POST /schemas/discover-prompt), client calls LLM (Anthropic or OpenAI-compatible) to generate YAML schema from ClickHouse table metadata. Replaced the GLiNER/gline-rs approach.

  • Weighted shortest path (cost(path) function) (PR #160): Supports Dijkstra-style weighted VLP traversal for queries like complex-14. WeightCteConfig carries weight info through the VLP pipeline; auto-creates bidirectional weight CTEs for undirected traversal.

  • List comprehension → arrayCount() optimization (PR #153): Parses [x IN list WHERE cond | expr] syntax, maps size(ListComprehension) to ClickHouse arrayCount() — avoids correlated subqueries that fail with UNION ALL ("Cannot clone Union plan step").

  • Pattern comprehension → pre-aggregated CTE approach (PR #159): Replaces correlated subqueries from size(PatternComprehension) with pre-aggregated CTEs + LEFT JOINs. Includes arrayConcat() for list concatenation (list1 + list2).

  • Official complex-7 — chained map access + NOT EXISTS (PR #152): Greedy chained property parsing (a.b.c), map literal node flattening (head(collect({key: node}))), split NOT EXISTS for undirected edges.

  • Official complex-3 — supertype inference + IN→OR expansion (PR #151): Supertype collapse (Post+Comment → Message), IN [col1, col2]OR expansion for ClickHouse compatibility, 5-WITH chain support.

  • Map property access (collect({score: x})[0].score → ClickHouse map subscript) (PR #147): Tracks map_keys through CTE pipeline, generates ArraySubscript for map property access with 0-based → 1-based index conversion.

  • UNWIND support (ARRAY JOIN) (PR #133): Translates Cypher UNWIND to ClickHouse ARRAY JOIN.

  • --log-level CLI flag for runtime log level configuration.

🐛 Bug Fixes

  • Undirected edge fixes: Removed has_nested_undirected_edge guard that prevented UNION split for mid-chain undirected edges (PR #147). Fixed BidirectionalUnion for multi-pattern MATCH with bound endpoints — collapses redundant Union to single Outgoing branch (PR #148).

  • VLP (variable-length path) fixes: Fixed path rewriting for reverse UNION branches (PR #135), composite ID support (PR #134, #136), *N..N exact-hop guard (PR #137), duplicate WITH RECURSIVE removal (PR #131), multi-VLP query support (PR #132), DISTINCT deduplication (PR #130), zero-lower-bound *0.. for single-type and multi-type VLPs (PR #142), CROSS JOIN removal for VLP CTEs in downstream queries (PR #145).

  • OPTIONAL MATCH fixes: INNER→LEFT JOIN conversion for CTE-backed JOINs in OPTIONAL MATCH context, spurious duplicate JOIN removal, orphan JOIN removal guards, collect(node) expansion to ID-only for has() compatibility (PR #143).

  • CTE/scope fixes: Bare variable resolution after WITH barrier (PR #120, #121), cte_references preservation in UNION branches (PR #122), composite alias augmentation (PR #128), buried WithClause preservation in DuplicateScansRemoving (PR #138).

  • shortestPath fixes: CASE path IS NULLifNull(minOrNull(hop_count), -1) rewriting, spurious non-VLP JOIN cleanup, endpoint inline filter preservation (PR #157).

  • Parser whitespace fix: MATCH/OPTIONAL MATCH now handle leading whitespace after $param syntax (PR #145).

  • Browser click-to-expand regressions: Fixed 5 bugs from scope resolution redesign — filter_tagging crash, VLP multi-type inference, type mismatch, polymorphic label extraction, pruned MATCH detection (PR #156).

  • Determinism fixes: HashSet→BTreeSet in anchor node selection, HashMap→BTreeMap in GraphSchema, sorted conversions in CTE extraction (PR #137, #139).

⚙️ Infrastructure

  • Integration test cleanup: 3,068 tests passing, 57 stale xfails removed (PR #169).
  • Scoping-only WITH collapse + benchmark infrastructure (PR #168): Optimizes scoping-only WITH clauses that don't need CTE materialization.
  • Schema-parameterized SQL generation tests: 76 tests across 6 schema variants (PR #162).
  • Browser interaction tests with full schema variant coverage (PR #161).
  • Version bump to v0.6.3-dev with README cleanup (PR #167).
  • Roadmap and guide updates (PR #166).

[0.6.2-dev] - 2026-02-20

⚙️ Architecture

  • Scope-aware variable resolution for CTE/UNION rendering (Feb 20, 2026, PR #120): Infrastructure for correct variable resolution across WITH barriers during SQL rendering.

    • Extended VariableSource::Cte with property_mapping (Cypher property → CTE column name) for runtime column resolution
    • Added resolve() to VariableRegistry for property lookup during SQL generation
    • Populated property mappings in build_chained_with_match_cte_plan loop from scope CTE variables
    • Wired VariableRegistry into SQL rendering via task-local QueryContext
    • Scope fixes: UNION branch recursion in rewrite_render_plan_with_scope; WITH barrier scope clearing between WITH clauses; per-CTE registry save/restore in Cte::to_sql()
    • Evidence: 2-WITH chain with bidirectional KNOWS now generates correct CTE alias references (a_b.p1_b_id instead of b.p1_b_id)
    • Files: 10 files, +486/-28 lines
    • Tests: 1,111 unit tests passing, LDBC 13/37 (35%) — no regression
  • Clean join generation architecture with anchor-aware algorithm (Feb 19, 2026, PR #117): Major refactoring of JOIN generation and ordering.

    • Core insight: Traditional node-edge-node is the base case (2 JOINs); all other JoinStrategy variants are optimizations that skip some JOINs
    • New generic algorithm: per-pattern loop → generate_pattern_joins() → VLP rewrites → optional marking → dedup → anchor selection → topological sort
    • Anchor-aware generation: Handles 4 cases (neither/left/right/both available) — critical for OPTIONAL MATCH shared-node patterns
    • Replaced ~1200 lines of per-strategy handler code with 64-line generic loop + clean 810-line module
    • Files: 5 files, +1002/-1296 lines (net -374 lines)
    • Tests: 1,040 unit tests passing, LDBC 13/37 (35%) — no regression

🐛 Bug Fixes

  • Neo4j Browser click-to-expand regression fixes (Feb 19, 2026, PR #116): Fixed 5 bugs introduced by the scope resolution redesign (PR #115) that completely broke click-to-expand in Neo4j Browser.
    • Bug 1 — filter_tagging crash: When TypeInference prunes all relationship types, filter_tagging crashed with no table context. Fixed by propagating Empty plan on error.
    • Bug 2a — VLP multi-type inference: Phase 1 computed the right GraphNode before plan_ctx was updated with inferred labels, causing Phase 2 to generate empty WHERE 0=1 UNION branches. Fixed by re-running infer_labels_recursive on the right node after multi-type detection.
    • Bug 2b — VLP+WITH type mismatch: JOIN between WITH CTEs and VLP CTEs failed (UInt64 vs String). Fixed by wrapping node id columns in toString().
    • Bug 2c — extract_node_labels not polymorphic: Returned only primary label when multiple node types were present. Fixed to return all types.
    • Bug 3 — empty SQL for pruned MATCH: is_return_only_query() misidentified pruned MATCH as pure RETURN. Fixed by checking Projection items for TableAlias (MATCH) vs Literal (RETURN).
    • Noise fix: HTTP OPTIONS/GET probes from Neo4j Browser on the Bolt port logged as ERROR. Downgraded to DEBUG.
    • Verification: User node expansion returns exactly 11 rows (3 FOLLOWS-out, 3 FOLLOWS-in, 2 AUTHORED, 3 LIKED) matching raw ClickHouse counts.

⚙️ Infrastructure

  • Neo4j Browser demo improvements (Feb 19, 2026, PR #116):
    • All 5 ClickHouse tables migrated from Memory to MergeTree ENGINE — data now persists across container restarts.
    • Removed duplicate data loading from setup.sh; init-db.sql is the single data entrypoint.
    • clickgraph service updated to official image genezhang/clickgraph:v0.6.2-dev.

🚀 Features

  • Foundational Variable Scope Resolution Redesign (Feb 2026): 🎉 MAJOR ARCHITECTURE FIX
    • Problem: The rendering pipeline resolved variables without scope context. Cypher's WITH creates scope barriers — only exported variables survive — but the SQL generator was unaware of this, causing leaked JOINs, wrong column references, and broken ORDER BY/GROUP BY/HAVING for post-WITH variables.
    • Root Cause: 13 separate resolution paths scattered across the codebase, a reverse_mapping hack (~88 usages) patching wrong results post-hoc.
    • Solution: VariableScope struct as a single, forward-only resolution source, built iteratively with each WITH iteration and threaded into every resolution site.
    • Architecture:
      VariableScope (new):
      ├─ Resolve alias.property → CteColumn | DbColumn | Unresolved
      ├─ Built per WITH iteration: scope.advance_with(alias, cte_name, mapping, labels)
      ├─ Covers: SELECT, WHERE, ORDER BY, GROUP BY, HAVING, JOIN conditions
      └─ Eliminates need for post-render reverse_mapping rewrites
      
    • Key Changes (22 commits):
      • src/render_plan/variable_scope.rs: New VariableScope, CteVariableInfo, rewrite_render_plan_with_scope() — expands bare CTE node vars into individual columns
      • src/render_plan/plan_builder_utils.rs: Scope built in build_chained_with_match_cte_plan() loop; alias rename mapping (WITH u AS person → maps person→u for property lookup)
      • src/render_plan/plan_builder.rs: Scope threaded into rendering pipeline
      • Removed ~1,362 net lines: intermediate_reverse_mapping, final reverse_mapping block, 6 helper functions for reverse-mapping rewrites
      • Fixed UNION CTE SELECT * → project needed columns per branch
      • Fixed aggregate UNION rendering (inner branches project raw columns, outer aggregates)
      • Fixed deterministic join ordering (HashMap+Vec preserves insertion order)
      • Fixed VLP+WITH JOIN type mismatch (toString() wrapping on UInt64 removed)
      • Fixed CTE node variable expansion in SELECT (bare a after WITH → individual columns)
      • Fixed alias renaming through WITH (WITH u AS person → resolves person.name)
    • Results:
      • ✅ 1,032/1,032 unit tests passing
      • ✅ Integration tests at parity with main branch (13/13 same pre-existing failures)
      • ✅ LDBC mini benchmark: 14/37 (38%), up from 10/37 (27%) baseline (+4 queries)
      • ✅ Zero new regressions
      • 🎯 Net: -1,362 lines (architecture cleaned, reverse_mapping eliminated)

🐛 Bug Fixes

  • ORDER BY, HAVING, LIMIT, SKIP clause extraction (Feb 17, 2026): Fixed critical bug where clauses were omitted in multiple code paths
    • Problem: Four code paths calling trait methods instead of utility functions → clauses dropped
    • Root Cause: self.extract_order_by() returns empty (trait default), should use plan_builder_utils::extract_order_by(self) (handles wrapper nodes)
    • Impact: ~50 ORDER BY integration tests failing, queries returning wrong order
    • Fixed Paths:
      1. GraphJoins path (commit 4a9ff13) - lines 2929-2938
      2. ViewScan path (commit 0acfd74) - lines 837, 845-847
      3. Union branch path (commit 0acfd74) - lines 1059, 1061, 1063-1065
      4. Pattern comprehension path (commit 0acfd74) - lines 1148, 1154, 1160-1161
    • Key Discovery: Cypher HAVING uses WITH...WHERE syntax (not direct HAVING keyword), already working correctly
    • Files Modified:
      • src/render_plan/plan_builder.rs: 4 code paths fixed to use utility functions
      • src/query_planner/analyzer/type_inference.rs: Fixed clippy warning
    • Testing: All 1,022 unit tests passing, ORDER BY verified in all query patterns
    • Expected Impact: ~50 failing integration tests → passing (585/960 → ~635/960, 61% → 66%)

🚀 Features

  • Schema/Type Inference Consolidation (Feb 16, 2026): 🎉 ARCHITECTURE CLEANUP - 668 LINES REMOVED

    • Mission: Merge overlapping SchemaInference + TypeInference into single unified pass
    • Problem: Two passes with duplicate logic (label inference, ViewScan resolution) + planning phase creating UNIONs without type knowledge → architectural debt
    • Solution: 6-phase incremental consolidation (Phases 0-E) with comprehensive testing
    • Implementation:
      • Phase 0: Added 79 gap coverage tests (multi-table, FK-edge, label inference, denormalized)
      • Phase A: Created function mapping document (8 cases analyzed)
      • Phase B: Extended TypeInference with Phase 0 (relationship inference) + Phase 3 placeholder
      • Phase C: Modified planning to return Empty for unlabeled nodes (removed 125 lines of premature UNION creation)
      • Phase D: Fixed SchemaInference to read labels from GraphNode.label (set by TypeInference Phase 2)
      • Phase E: Implemented full Phase 3 ViewScan resolution, removed SchemaInference completely
    • Architecture After:
      UnifiedTypeInference (4 phases):
      ├─ Phase 0: Relationship-based label inference (from SchemaInference)
      ├─ Phase 1: Filter→GraphRel UNION (existing, working)
      ├─ Phase 2: Untyped node UNION with direction validation (browser bug fix)
      └─ Phase 3: ViewScan resolution (from SchemaInference)
      
    • Key Changes:
      • src/query_planner/analyzer/type_inference.rs: +755 lines (Phase 0 + Phase 3 implementation)
      • src/query_planner/logical_plan/match_clause/helpers.rs: -125 lines (UNION creation removed)
      • src/query_planner/analyzer/schema_inference.rs: DELETED (-1308 lines)
      • src/query_planner/analyzer/mod.rs: Removed SchemaInference pass
    • Results:
      • ✅ Single source of truth for type resolution
      • ✅ Cleaner architecture (one pass instead of two overlapping passes)
      • ✅ Direction validation works everywhere (Phase C fix)
      • ✅ Better performance (one less analyzer pass)
      • ✅ All 1022 unit + 36 integration tests passing
      • 🎯 Net: -668 lines (removed 1445, added 777)
    • Testing: Comprehensive gap coverage tests, baseline capture with rollback tags, incremental validation at each phase
    • Documentation: Updated STATUS.md, type-inference architecture notes
    • Impact: 🎉 Major architectural improvement with zero behavior changes
  • Unified Type Inference with Direction Validation (Feb 16, 2026): 🎯 NEO4J BROWSER FIX

    • Problem: Neo4j Browser expand feature showed relationships in wrong direction (Post→User instead of schema-defined User→Post)
    • Root Cause: Browser queries like MATCH (a)--(b) WHERE id(a) IN [Post.1] had labels extracted from WHERE constraints, but no pass validated direction against schema. Invalid branches like (Post)-[AUTHORED]->(User) passed through despite schema defining User→Post.
    • Solution: Extended TypeInference to merge PatternResolver functionality, extract WHERE constraints, validate direction, and optimize undirected patterns
    • Key Improvements:
      • WHERE constraint extraction: extract_labels_from_where() decodes id() IN [...] patterns from LogicalExpr
      • Direction validation: check_relationship_exists_with_direction() enforces schema direction constraints
      • Undirected optimization: optimize_undirected_pattern() converts Direction::Either to unidirectional when all valid combinations go same direction
      • UNION generation: try_generate_union_with_constraints() creates Union with only schema-valid branches
    • Architecture:
      Filter(WHERE id(a) IN [...])
        └─ GraphRel(a, r, b, direction=Either)
      
      ↓ UnifiedTypeInference
      
      1. Extract labels from WHERE: a ∈ {Post}, b ∈ {User}
      2. Check schema: User→Post (AUTHORED, LIKED), User→User (FOLLOWS)
      3. Optimize: All Post combinations go backward → Convert Either to Incoming
      4. Generate Union with valid branches only
      
    • Algorithm (src/query_planner/analyzer/type_inference.rs):
      1. Intercepts Filter→GraphRel patterns
      2. Extracts WHERE constraints (labels from id() calls)
      3. Computes possible types (explicit labels + WHERE + schema)
      4. Optimizes undirected patterns (Either→Outgoing/Incoming when unidirectional)
      5. Validates each (left, rel, right) combination with direction check
      6. Generates Union if multiple branches, single branch if one, skips if zero
    • Results:
      • ✅ UNION generation: 3 branches for valid User→{User,Post} patterns
      • ✅ Direction filtering: MATCH (p:Post)--(u:User) correctly uses schema direction (User→Post)
      • ✅ Invalid branches excluded: MATCH (p:Post)-[r]->(u:User) returns 0 (correct!)
      • ✅ Undirected optimization: (Post)--(User) with Direction::Either converts to Incoming
    • PatternResolver Deprecated: Functionality merged into TypeInference
    • Testing: Manual verification with Neo4j Browser patterns, direction validation tests
    • Impact: 🎉 Neo4j Browser expand feature now shows correct relationship directions

🐛 Bug Fixes

  • OPTIONAL MATCH Schema Lookup Fix (Feb 3, 2026): ✅ ALL SMOKE TESTS PASSING
    • Problem: OPTIONAL MATCH queries failed with "Relationship with type FOLLOWS not found" due to incomplete node label inference
    • Root Cause: Relationship schemas stored only with composite keys (TYPE::FROM::TO), but OPTIONAL MATCH used simple keys (TYPE)
    • Solution: Enhanced schema storage and lookup to support both composite and simple key access patterns
    • Changes:
      • src/graph_catalog/config.rs: Store relationships with both composite and simple keys for backward compatibility
      • src/graph_catalog/graph_schema.rs: Added fallback logic in get_rel_schema_with_nodes() to try composite keys when simple key lookup fails
    • Result: All 10 smoke tests now passing (previously 7/10), including OPTIONAL MATCH with aggregation
    • Impact: Robust relationship resolution for all query types (regular MATCH, OPTIONAL MATCH, multi-type patterns)

�🚀 Features

  • PatternResolver - Automatic Type Enumeration (Feb 8, 2026): 🧠 SCHEMA INTELLIGENCE

    • Problem: Untyped graph patterns (MATCH (n)) fail or behave unpredictably without explicit type labels
    • Solution: Systematic type resolution that automatically enumerates all valid type combinations from schema
    • What Works:
      • Automatic discovery: Recursively finds all untyped variables in logical plan
      • Schema querying: Collects all valid node types for each untyped variable
      • Combination generation: Creates cartesian product of type assignments (limited to 38 by default)
      • Relationship validation: Filters combinations based on schema relationship constraints
      • Query cloning: Creates separate typed query for each valid combination
      • UNION ALL: Combines all typed queries into single result
      • Graceful fallback: Continues with original plan if errors occur
    • Example:
      -- Input: Exploratory query without type labels
      MATCH (o) RETURN o.name LIMIT 10
      
      -- PatternResolver transforms to:
      MATCH (o:User) RETURN o.name LIMIT 10
      UNION ALL
      MATCH (o:Post) RETURN o.name LIMIT 10
    • Architecture (7 phases, ~1100 lines):
      • Phase 0: Infrastructure (status message system, configuration)
      • Phase 1: Discovery (recursive traversal to find untyped GraphNode variables)
      • Phase 2: Schema Query (collect type candidates for each variable)
      • Phase 3: Combination Generation (iterative cartesian product with early termination)
      • Phase 4: Validation (extract relationships, filter invalid combinations)
      • Phase 5: Query Cloning (recursive cloning with label insertion)
      • Phase 6: UNION ALL (combine typed queries into Union plan)
      • Phase 7: Integration (Step 2.1 in analyzer pipeline, after TypeInference)
    • Configuration:
      • CLICKGRAPH_MAX_TYPE_COMBINATIONS=38 (default, max 1000)
      • Prevents combination explosion in large schemas
    • Performance: <10ms overhead for typical queries (1-2 untyped variables)
    • Integration Strategy:
      • TypeInference (Step 2): Handles deterministic type inference (e.g., from relationship type)
      • PatternResolver (Step 2.1): Handles non-deterministic cases (creates UNION ALL)
      • Complementary, not redundant - PatternResolver only activates on remaining untyped nodes
    • Use Cases:
      • Exploratory analysis: MATCH (n) RETURN count(n) - count all nodes across types
      • Multi-type patterns: MATCH (a)-[r]->(b) RETURN * - all relationships
      • Schema discovery: MATCH (n) RETURN distinct labels(n) - find node types
    • Impact: ✨ Enables true exploratory graph queries without manual type annotations
    • Testing:
      • 16 dedicated unit tests (100% passing)
      • 995/995 total tests passing (zero regressions)
      • Covers all phases: discovery, combinations, validation, cloning
    • Files:
      • New: src/query_planner/analyzer/pattern_resolver.rs (1033 lines)
      • New: src/query_planner/analyzer/pattern_resolver_config.rs (58 lines)
      • Modified: src/query_planner/analyzer/mod.rs (pipeline integration)
      • Modified: src/query_planner/plan_ctx/mod.rs (status message system)
    • Branch: feature/pattern-resolver (10 commits, +1202/-24 lines)
    • Documentation: See notes/pattern-resolver.md for implementation details
  • Property-Based UNION Pruning (Track C) (Feb 3, 2026): ⚡ PERFORMANCE OPTIMIZATION

    • Problem: Untyped graph patterns (MATCH (n) WHERE n.property...) generated UNION across ALL types, wasting resources
    • Solution: Automatic schema-based filtering - only query types that have the required properties
    • Performance: 10x-50x faster for queries on schemas with many node/relationship types
    • What Works:
      • Node patterns: MATCH (n) WHERE n.user_id = 1 → Only queries User type (not all 10+ types)
      • Relationship patterns: MATCH ()-[r]->() WHERE r.follow_date... → Only queries FOLLOWS type
      • UNION ALL queries: Each branch filters independently (automatic)
      • Single-branch optimization: Skips UNION wrapper when only 1 type matches
      • Empty result optimization: Returns 0 rows immediately when no types match
    • Property Extraction: ANY property reference implies property must exist
      • n.property > value → requires property
      • n.x = 1 AND n.y = 2 → requires both x and y
      • Works in functions: length(n.name) → requires name
    • Architecture (5 phases, ~800 lines):
      • Phase 1: WherePropertyExtractor - Recursively extracts ALL property references from WHERE clauses
      • Phase 2: SchemaPropertyFilter - Filters node/relationship schemas using HashSet::is_subset()
      • Phase 3: Single-branch optimization in generate_scan() (0 types → Empty, 1 type → ViewScan, N types → filtered UNION)
      • Phase 4: Relationship filtering in traversal.rs (stores filtered types in GraphRel.labels)
      • Phase 5: UNION ALL auto-supported (each branch gets independent PlanCtx)
    • Example:
      -- Before: UNION across ALL node types
      MATCH (n) WHERE n.user_id = 1 RETURN n
      -- Generated SQL scanned: users, posts, connections, orders, etc. (10+ tables)
      
      -- After: Only User type
      -- Generated SQL scanned: users (1 table)
      -- Result: 10x-50x faster
    • Impact: ✨ Neo4j Browser exploration queries now performant on large schemas
    • Testing:
      • 949/949 unit tests passing (100%, zero regressions)
      • 2/3 integration tests passing (schema loading setup pending)
    • Files:
      • New: src/query_planner/analyzer/where_property_extractor.rs (339 lines)
      • New: src/query_planner/logical_plan/match_clause/schema_filter.rs (130 lines)
      • New: tests/integration/test_track_c_property_filtering.py (155 lines)
      • Modified: helpers.rs, traversal.rs, view_scan.rs, filter_tagging.rs, schema_inference.rs, plan_ctx/mod.rs
    • Branch: feature/track-c-property-optimization (8 commits)
  • Top-Level UNION ALL Support (Feb 2, 2026): Combine multiple independent queries with UNION/UNION ALL

    • Syntax: query1 UNION ALL query2 for combining results from different queries
    • Features:
      • Per-branch clauses: DISTINCT, LIMIT, WHERE, ORDER BY supported in each branch
      • Mixed entity types: Nodes and relationships can be combined in same result set
      • Both UNION (removes duplicates) and UNION ALL (keeps duplicates) supported
    • Requirements:
      • Column count and names must match across branches
      • Types should be compatible (ClickHouse requirement)
    • Known Limitations:
      • Requires explicit labels (:User, :Post); untyped patterns (MATCH (n)) require Track C
      • Type casting may be needed for incompatible types across branches
    • Testing: 3 integration tests covering simple unions, DISTINCT/LIMIT, and mixed node/relationship queries
    • Examples:
      -- Multi-type aggregation
      MATCH (u:User) RETURN "users" AS type, count(*) AS count
      UNION ALL
      MATCH ()-[r:FOLLOWS]->() RETURN "follows" AS type, count(*) AS count
      
      -- Schema merging
      MATCH (u:User) RETURN u.name, u.email, "user" AS source
      UNION ALL
      MATCH (a:Admin) RETURN a.name, a.email, "admin" AS source
    • Files: server/handlers.rs, server/sql_generation_handler.rs, tests/integration/test_union_all.py
    • Branch: feature/top-level-union-all
    • Documentation: Added comprehensive section in Cypher Language Reference
  • Path UNION Queries for Neo4j Browser "Dot" Feature (Feb 2, 2026): ⭐ NEO4J COMPATIBILITY

    • Problem: Neo4j Browser's dot query explorer sends MATCH p=()-->() RETURN p but ClickGraph couldn't handle untyped paths with properties
    • Solution: Reused Union infrastructure to generate UNION ALL across all relationship types with JSON property format
    • How It Works:
      • plan_builder.rs detects path UNION patterns (GraphJoins with path tuples)
      • convert_path_branches_to_json() transforms each branch to consistent 4-column JSON schema
      • build_format_row_json() uses prefixed aliases (_s_city, _e_city, _r_follow_date) to avoid ClickHouse alias collision
      • select_builder.rs expands denormalized relationship properties via schema lookup
      • Bolt transformer strips prefixes for clean Neo4j Browser display
    • Generated SQL Pattern:
      SELECT tuple('fixed_path', 't1_0', 't2_0', 't3') as p,
             formatRowNoNewline('JSONEachRow', t1_0.user_id AS _s_user_id, ...) as _start_properties,
             formatRowNoNewline('JSONEachRow', t2_0.post_id AS _e_post_id, ...) as _end_properties,
             formatRowNoNewline('JSONEachRow', t3.post_date AS _r_post_date) as _rel_properties
      FROM users_bench t1_0 JOIN posts_bench t2_0 ... JOIN posts_bench t3
      UNION ALL ...
    • Impact: ✨ Neo4j Browser dot query now shows all connected edges with properties!
    • Key Features:
      • All relationship types included (denormalized + explicit edge tables)
      • Type preservation: numbers stay numbers, dates stay dates
      • Automatic property expansion for denormalized relationships (e.g., AUTHORED)
      • Clean property names in browser (prefixes internal only)
    • Files: src/render_plan/plan_builder.rs, src/render_plan/plan_builder_helpers.rs, src/render_plan/select_builder.rs, src/server/bolt_protocol/result_transformer.rs
  • Label-less Node Queries for Neo4j Browser "Dot" Feature (Feb 1, 2026): ⭐ NEO4J COMPATIBILITY

    • Problem: Neo4j Browser's exploration feature sends MATCH (n) RETURN n LIMIT 25 but ClickGraph required explicit labels
    • Solution: Reused existing Union infrastructure to generate UNION ALL across all node types when no label specified
    • How It Works:
      • generate_scan() detects label-less patterns and creates Union of ViewScans for all node types in schema
      • Multi-label scan detection recursively unwraps GraphJoins→Projection→GraphNode→ViewScan layers
      • json_builder::generate_multi_type_union_sql() generates uniform columns: _label, _id, _properties
      • is_multi_label_scan flag preserves special columns through Projection pass
    • Generated SQL Pattern:
      WITH __multi_label_union AS (
        SELECT 'User' as _label, toString(user_id) as _id, formatRowNoNewline('JSONEachRow', ...) as _properties FROM users
        UNION ALL
        SELECT 'Post' as _label, toString(post_id) as _id, formatRowNoNewline('JSONEachRow', ...) as _properties FROM posts
      )
      SELECT n._label, n._id, n._properties FROM __multi_label_union AS n LIMIT 25
    • Impact: ✨ Neo4j Browser "dot" exploration now works - click any node to see all connected nodes!
    • Files: src/query_planner/logical_plan/match_clause/helpers.rs, src/render_plan/plan_builder.rs, src/render_plan/mod.rs
  • RETURN Clause Evaluation for Procedures (Feb 1, 2026): ⭐ CRITICAL FEATURE - Full RETURN clause support for procedure-only queries

    • Problem: Neo4j Browser schema sidebar was empty because Browser sends complex UNION queries with RETURN clauses that aggregate procedure results
    • Solution: Implemented complete RETURN clause evaluator in src/procedures/return_evaluator.rs with:
      • Expression evaluation: variables, literals, map literals, list construction, property access
      • Aggregation functions: COLLECT (array aggregation), COUNT (with distinct support)
      • Array slicing: [..1000], [5..], [2..10] operations
      • Proper aggregation semantics: processes all records to produce single aggregated result
    • Architecture: Async-safe execution flow with ExecutionPlan enum to cross async boundaries
    • Example Query: CALL db.labels() YIELD label RETURN {name:'labels', data:COLLECT(label)[..1000]} AS result
    • Result Format: Returns aggregated structure Browser expects: {result: {name: 'labels', data: [...]}}
    • Impact: ✨ Neo4j Browser schema sidebar now auto-populates with labels, relationships, and properties!
    • Testing: 3/3 unit tests + E2E validation with Python neo4j-driver (3-branch UNION query works perfectly)
    • Files: New: src/procedures/return_evaluator.rs; Modified: src/server/bolt_protocol/handler.rs, src/procedures/executor.rs
  • Neo4j Schema Metadata Procedures (Feb 2026): Implemented 4 essential procedures for Neo4j tool compatibility

    • New Procedures:
      • CALL db.labels() - Returns all node labels in current schema
      • CALL db.relationshipTypes() - Returns all relationship types
      • CALL db.propertyKeys() - Returns all unique property keys from nodes and relationships
      • CALL dbms.components() - Returns ClickGraph version, name, and edition
    • Architecture: New top-level src/procedures/ module for future extensibility; CypherStatement changed from struct to enum (Query | ProcedureCall)
    • Execution Flow: Procedures bypass query planner and execute directly against GLOBAL_SCHEMAS for fast response (<5ms)
    • Multi-Schema Support: Works with schema_name request parameter to query different schemas
    • Response Format: Neo4j-compatible JSON with count and records fields
    • Impact: Enables Neo4j Browser and Neodash visualization tools to introspect ClickGraph schemas and show autocomplete
    • Testing: 922 unit tests passing + E2E validation with scripts/test/test_procedures.sh
    • Files:
      • New: src/procedures/*.rs (mod, executor, db_labels, db_relationship_types, dbms_components, db_property_keys)
      • New: src/open_cypher_parser/standalone_procedure_call.rs (parser for CALL statements)
      • Modified: src/server/handlers.rs (procedure detection and execution), src/open_cypher_parser/ast.rs (CypherStatement enum)
      • Test: scripts/test/test_procedures.sh
    • Branch: feature/neo4j-schema-procedures

🔒 Security

  • Parser Recursion Depth Limits (Jan 26, 2026): Added MAX_RELATIONSHIP_CHAIN_DEPTH = 1000 to prevent DoS attacks
    • Problem: Unbounded recursion in parse_consecutive_relationships() vulnerable to stack overflow on malicious inputs like ()-[]->()-[]->... (1000+ hops)
    • Solution: Created depth-tracking wrapper parse_consecutive_relationships_with_depth(input, depth) that returns ErrorKind::TooLarge when depth > 1000
    • Test Coverage: 4 comprehensive tests for reasonable depth (100), max depth (1000), exceeds limit (1001), error clarity (1050)
    • Impact: Parser now protected against DoS via deep recursion; all 184 parser tests passing
    • Files: src/open_cypher_parser/path_pattern.rs

🐛 Bug Fixes

  • Denormalized Single-Hop Property Access (Jan 30, 2026): ⭐ CRITICAL BUG FIX - Fixed denormalized schemas generating SQL with wrong table alias

    • Problem: Single-hop queries like MATCH (a:User)-[r:FOLLOWS]->(b:User) RETURN a.name, b.city on denormalized schemas generated SELECT t.name, t.city FROM user_follows AS r with wrong alias 't' instead of 'r', causing "Unknown expression identifier" errors
    • Root Cause: PlanCtx stored denormalized node→edge mappings during query planning, but rendering phase used task-local storage - the transfer between these phases was missing!
    • Solution: Added transfer loop in to_render_plan_with_ctx() to copy denormalized aliases from PlanCtx to task-local storage before rendering
    • Architecture: Three-phase lifecycle documented in docs/architecture/denormalized-alias-lifecycle.md (Planning → Transfer → Rendering)
    • Test Coverage: Added 19 comprehensive tests for single-hop property selection patterns across all schema types
    • Impact: All denormalized single-hop queries now work correctly; bug blocked alpha release
    • Files: src/render_plan/plan_builder.rs, src/query_planner/plan_ctx/mod.rs
    • Tests: tests/integration/matrix/test_single_hop_properties.py (19 passing tests)
  • Nested WITH Filtered Exports (Jan 26, 2026): Fixed infinite iteration loop in nested WITH clauses with filtered exports

    • Problem: Queries like MATCH (u:User) WITH u AS person WITH person.name AS name RETURN name hit 10-iteration safety limit and failed
    • Root Cause: collapse_passthrough_with() required both key and CTE name match (key == target_alias && this_cte_name == target_cte_name) instead of just key match
    • Solution: Changed condition to key == target_alias to allow passthrough WITH collapse when key matches target alias
    • Impact: Nested WITH with filtered exports now work correctly (3/4 test scenarios passing, aggregation remains separate issue)
    • Files: src/render_plan/plan_builder_utils.rs
  • EXISTS Subquery Schema Context (Jan 25, 2026): Fixed EXISTS subqueries using wrong schema/table

    • Problem: EXISTS subqueries like WHERE EXISTS { MATCH (a)-[:FOLLOWS]->(b) } were generating SQL with wrong tables
    • Root Cause: tokio::task_local! for query schema context requires .scope() wrapper; without it, try_with() returns None and fallback schema search picks wrong schema when multiple schemas have same relationship type
    • Solution: Changed from tokio::task_local! to thread_local! which is accessible without scope wrapping
    • Impact: All EXISTS subquery tests now passing (3/3)
    • Files: src/render_plan/render_expr.rs
  • WITH+Aggregation Scalar Export (Jan 25, 2026): Fixed WITH clauses with aggregations not generating CTE references

    • Problem: Queries like MATCH (a)-[r]->(b) WITH count(r) AS total RETURN total failed with "CTE not found" errors
    • Root Cause: export_single_with_item_to_cte() didn't handle TableAlias and PropertyAccessExp expression types for scalar exports
    • Solution: Added explicit handling for TableAlias (direct alias reference) and PropertyAccessExp (property.name pattern) in WITH item export logic
    • Impact: WITH clauses with aggregated scalars now work correctly
    • Files: src/render_plan/plan_builder_utils.rs
  • Denormalized VLP Property Access: Fixed incorrect table alias usage in VLP queries with denormalized relationships

    • Problem: Queries like MATCH path = (origin:Airport)-[f:FLIGHT*1..2]->(dest:Airport) RETURN origin.city generated SELECT f.OriginCityName instead of t.OriginCityName
    • Root Cause: SelectBuilder was using relationship table alias instead of CTE table alias for denormalized node properties in VLP contexts
    • Solution: Added hack in SelectBuilder to detect denormalized VLP property access (column names containing "Origin" or "Dest") and use CTE table alias "t"
    • Impact: All denormalized edge tests now passing (16/18, 2 expected failures), VLP property access working correctly
    • Files: src/render_plan/select_builder.rs
    • Tests: All denormalized edge integration tests passing
  • OPTIONAL MATCH + Inline Property Filters: Fixed invalid SQL generation when inline properties appear on nodes in OPTIONAL MATCH clauses

    • Problem: Inline property filters like (b:TestUser {name: 'Bob'}) in OPTIONAL MATCH were incorrectly injected as WHERE conditions instead of LEFT JOIN conditions
    • Root Cause: FilterIntoGraphRel optimizer was injecting filters into ViewScan.view_filter for all GraphNode patterns, including optional ones
    • Solution: Modified FilterIntoGraphRel to skip filter injection for optional aliases (identified via plan_ctx.get_optional_aliases())
    • Impact: LDBC IS-7 query and similar patterns with inline properties in OPTIONAL MATCH now generate correct LEFT JOIN SQL
    • Files: src/query_planner/optimizer/filter_into_graph_rel.rs
    • Tests: Added test_optional_match_inline_properties test case, all OPTIONAL MATCH tests now 26/27 passing (96%)

�🚀 Features

  • Multi-Table Label Union (MULTI_TABLE_LABEL): Complete support for aggregation queries on nodes that appear in multiple tables
    • Feature: Nodes with the same label appearing in multiple contexts (e.g., IP appearing in dns_log FROM, dns_log TO, and conn_log) now generate proper UNION queries with aggregation
    • Example: MATCH (n:IP) RETURN count(DISTINCT n.ip) now correctly generates UNION across all IP tables with aggregation wrapping
    • Implementation:
      1. get_all_node_schemas_for_label() method in src/graph_catalog/graph_schema.rs finds all tables with same label
      2. Logical plan generates UNION with branches for each context
      3. SQL generation wraps UNION in subquery and applies aggregation on top
    • Impact: Denormalized graph schemas with multi-context node labels now fully supported for analytical queries
    • Files: src/graph_catalog/graph_schema.rs, src/query_planner/logical_plan/match_clause.rs, src/render_plan/plan_builder.rs, src/clickhouse_query_generator/to_sql_query.rs
    • Tests: All 784 unit tests passing, no regressions

🧪 Testing

  • Comprehensive Integration Testing Validation: Successfully ran full 3489-test integration suite after critical bug fixes
    • Setup: Loaded test_integration database tables (fs_objects, groups, memberships, etc.) using scripts/test/load_test_integration_data.sh
    • Results: 128 passed, 3 failed, 17 skipped, 5 xfailed, 3 xpassed (97% success rate on executed tests)
    • Critical Validations:
      • ✅ Variable-length paths (VLP) all working (28/28 tests passing)
      • ✅ OPTIONAL MATCH functionality validated (3/3 tests passing)
      • ✅ WITH clause chaining working (6/6 tests passing)
      • ✅ All core query patterns functional
    • Remaining Issues: 3 undirected relationship test failures (non-critical, SQL generation scoping issues)
    • Impact: Confirms codebase stability after major refactoring, validates all critical bug fixes are working in production scenarios

🐛 Bug Fixes

  • Denormalized Node UNION Duplication: Fixed duplicate UNION branches and incorrect property mappings in denormalized graph queries

    • Issue: Denormalized queries generating 4 UNION branches instead of 2, with some branches using wrong property column names (Origin vs Destination)
    • Root Cause: Composite keys (e.g., "dns_log::TO::IP") were creating duplicate metadata entries, and aggregation SQL was using plan.select instead of branch-specific select items
    • Fix 1: Filter out composite keys in build_denormalized_metadata() to eliminate duplicate entries
    • Fix 2: Use union_branch.select.to_sql() instead of plan.select.to_sql() in aggregation rendering to respect branch-specific property mappings
    • Impact: Denormalized queries now generate correct UNION with proper column mappings
    • Files: src/graph_catalog/graph_schema.rs, src/clickhouse_query_generator/to_sql_query.rs
    • Tests: Denormalized aggregation tests now pass, 784/784 unit tests passing
  • GraphJoins UNION Extraction for Nested Unions: Fixed missing FROM clause in aggregation queries on UNION results

    • Issue: Queries like MATCH (n:IP) RETURN count(DISTINCT n.ip) generating SELECT without FROM clause, causing "Unknown identifier" errors
    • Root Cause: Union nested inside GraphNode → Projection → GroupBy → GraphJoins was never extracted because extract_union() only checked immediate input, not recursively through wrapper nodes
    • Fix: Implemented recursive unwrapping in extract_union() to detect Union at any depth (GraphNode, Projection, GroupBy), then properly convert to RenderPlan with union branches set
    • Impact: Multi-table aggregations and MULTI_TABLE_LABEL queries now work end-to-end with proper SQL generation
    • Files: src/render_plan/plan_builder.rs (lines 706-729, extract_union method)
    • Tests: All 784 unit tests passing, no regressions, aggregation queries now generate valid SQL
  • OPTIONAL MATCH with variable-length paths (VLP): Fixed SQL generation for OPTIONAL MATCH containing variable-length path patterns

    • Issue: Queries like MATCH (a:User) WHERE a.name = 'Eve' OPTIONAL MATCH (a)-[:FOLLOWS*1..3]->(b:User) RETURN a.name, COUNT(b) returned 0 rows instead of 1 row with count=0 when no paths exist
    • Root Cause: VLP CTE was incorrectly used as FROM clause instead of being LEFT JOINed to the anchor node from required MATCH, causing rows with no paths to be filtered out
    • Fix: Added graph_rel field to Join struct to track graph relationship information needed for proper LEFT JOIN generation in VLP cases. Updated all Join struct initializers across codebase to include graph_rel: None for non-VLP joins and graph_rel: Some(Arc::new(graph_rel)) for VLP-specific joins
    • Impact: OPTIONAL MATCH tests improved from 24/27 to 25/27 passing (93%). Users with no outgoing paths now correctly appear in results with count=0
    • Files:
      • src/logical_plan/mod.rs (Join struct definition with new graph_rel field)
      • src/render_plan/mod.rs (Join struct definition with new graph_rel field)
      • 40+ Join initializers updated across src/render_plan/ and src/query_planner/analyzer/ modules
    • Tests: test_optional_variable_length_no_path, test_optional_unbounded_path now passing
    • Generated SQL: Now correctly generates FROM users AS a LEFT JOIN vlp_a_b AS t ON t.start_id = a.user_id instead of FROM vlp_a_b AS t
  • OPTIONAL MATCH first pattern with disconnected patterns: Fixed SQL generation for queries where OPTIONAL MATCH comes before required MATCH with no shared nodes

    • Issue: Queries like OPTIONAL MATCH (a)-[:FOLLOWS]->(b) WHERE a.name='Eve' MATCH (x) WHERE x.name='Alice' generated SQL with undefined aliases or incorrect FROM clause selection
    • Root Cause: Three-layer problem:
      1. GraphJoinInference: connect_left_first logic excluded optional patterns from LEFT-first connection
      2. GraphJoinInference: FROM marker selection preferred first marker (optional) instead of required patterns
      3. Join rendering: Joins with empty joining_on were skipped entirely, missing required CROSS JOINs
    • Fix:
      1. Changed connect_left_first to always return true for is_first_relationship (regardless of optionality)
      2. Modified FROM marker creation to include all is_first_relationship patterns with appropriate join_type
      3. Added FROM marker selection logic preferring Inner (required) over Left (optional) joins
      4. Implemented CROSS JOIN rendering (ON 1=1) for joins with empty joining_on, distinguishing Left vs Inner
    • Impact: OPTIONAL MATCH tests improved from 17/27 to 24/27 passing (89%)
    • Files:
      • src/query_planner/analyzer/graph_join_inference.rs (59 lines: connect_left_first, FROM marker logic)
      • src/render_plan/plan_builder.rs (110 lines: CartesianProduct swap logic)
      • src/render_plan/join_builder.rs (53 lines: CROSS JOIN rendering)
    • Tests: test_optional_then_required, test_interleaved_required_optional now passing
    • Generated SQL: FROM x LEFT JOIN a ON 1=1 LEFT JOIN t1 ON t1.follower_id=a.user_id LEFT JOIN b ON b.user_id=t1.followed_id
  • VLP + WITH aggregation GROUP BY alias fix: Fixed incorrect GROUP BY alias in variable-length path queries with aggregation

    • Issue: Queries like MATCH (a)-[*1..2]->(b) WITH b, COUNT(*) AS cnt RETURN ... generated GROUP BY b.end_id which fails because b doesn't exist as a SQL table alias (the FROM clause uses vlp_a_b AS t)
    • Root Cause: expand_table_alias_to_group_by_id_only() in plan_builder_utils.rs wasn't detecting VLP endpoint aliases and was returning the Cypher alias instead of the VLP CTE alias
    • Fix: Added VLP endpoint detection at the start of the function using get_graph_rel_from_plan(). When alias matches VLP left/right connection, returns t.start_id or t.end_id using the VLP_CTE_DEFAULT_ALIAS constant
    • Impact: VLP + WITH aggregation queries now execute successfully with correct GROUP BY t.end_id
    • Files: src/render_plan/plan_builder_utils.rs (lines 4476-4530, expand_table_alias_to_group_by_id_only function)
    • Tests: All 784 unit tests passing, verified with social_benchmark schema
  • ArraySlicing property mapping fix: Property mappings now correctly applied inside ArraySlicing expressions like collect(n.name)[0..10]

    • Issue: ArraySlicing handler in apply_property_mapping wasn't recursively mapping the inner array expression
    • Fix: Added recursive property mapping for array, from, and to components of ArraySlicing expressions
    • Impact: All 10 test_collect tests now pass, expressions like collect(u.name)[0..2] correctly generate full_name in SQL
    • Files: src/query_planner/analyzer/filter_tagging.rs (lines 1057-1088)
  • CTE column aliasing underscore convention fix: WITH clauses now correctly use underscore aliases (a_name) in CTE columns instead of dot notation (a.name)

    • Issue: TableAlias expansion in WITH clauses was using dot notation for column aliases, causing inconsistent naming between CTE and final SELECT
    • Fix: Modified CTE extraction to expand TableAlias to individual PropertyAccessExp with underscore aliases using get_properties_with_table_alias()
    • Impact: CTE columns now use underscore convention (a_name, a_user_id) while final SELECT uses AS for dot notation (a_name AS "a.name")
    • Files: src/render_plan/cte_extraction.rs (TableAlias expansion logic, lines 2881-2896; LogicalColumnAlias import and usage)
    • Tests: cte_column_aliasing_underscore_convention test now passes, all integration tests passing (17/17)
  • Shortest path FROM clause fix (single-type VLP): Single-type variable-length paths now correctly use CTE in FROM clause instead of start node table

    • Issue: GraphJoins.extract_from() for empty joins checked variable-length paths AFTER denormalized/polymorphic checks
    • Fix: Moved single-type variable-length check to top priority (A.1) before other pattern checks
    • Impact: All 5 shortest path filter tests for single-type variable-length paths now pass with correct SQL: FROM vlp_a_b AS p instead of FROM test_db.users AS a
    • Limitation: Multi-type variable-length paths (e.g., [:TYPE1|TYPE2*1..3]) use CTE names like vlp_multi_type_a_b and are handled separately in plan_builder_utils.rs
    • Files: src/render_plan/plan_builder.rs (extract_from method, lines 1283-1299; single-type VLP handling)

⚙️ Refactoring

  • plan_builder.rs Phase 2 COMPLETE: All 4 domain builders extracted, performance validated, modular architecture achieved

    • Complete module extraction: 4 specialized builders extracted (join_builder.rs: 1,790 lines, select_builder.rs: 130 lines, from_builder.rs: 849 lines, group_by_builder.rs: 364 lines)
    • plan_builder.rs reduced: From 9,504 to 1,516 lines (84% reduction in main file, 3,133 lines extracted)
    • Trait-based delegation: Clean RenderPlanBuilder trait with delegation to all 4 builder modules
    • Performance validated: Cypher-to-SQL translation <14ms for all benchmark queries, <5% regression requirement met
    • Architecture complete: Modular design with excellent performance and maintainability
    • Compilation successful: All ambiguities resolved with explicit <LogicalPlan as GroupByBuilder> syntax
    • All tests passing: 770/770 unit tests (100%), 12/17 integration tests (71%, same as before)
    • Code quality maintained: Comprehensive documentation, helper functions for node property resolution
    • plan_builder.rs reduced: From 1,749 to 1,526 lines (223 lines extracted, 13% reduction this week, 39% total)
    • Ready for Week 7: Safe to proceed with order_by_builder.rs extraction
  • plan_builder.rs Phase 2 Week 5 Complete: from_builder.rs extraction finished, modular architecture expanded further

    • from_builder.rs fully implemented: Complete extraction of extract_from() function with all FROM resolution logic (864 lines)
    • Trait-based delegation: FromBuilder trait with extract_from() method for clean separation
    • Complex FROM logic extracted: Handles ViewScan, GraphNode, GraphRel (denormalized/VLP/optional/anonymous edges), GraphJoins (FROM markers/anchor resolution/CTEs), CartesianProduct (WITH...MATCH patterns)
    • Helper function integration: Imports from plan_builder_helpers for extract_table_name, is_node_denormalized, find_anchor_node, extract_rel_and_node_tables, find_table_name_for_alias, get_all_relationship_connections
    • Modular architecture expanded: Clean separation between plan_builder.rs and from_builder.rs with proper trait imports
    • Compilation successful: All imports resolved, no compilation errors, functionality preserved through trait delegation
    • All tests passing: 770/770 unit tests (100%), 12/17 integration tests (71%, same as before)
    • Code quality maintained: Comprehensive documentation, error handling, and performance characteristics
    • plan_builder.rs reduced: From 2,490 to 1,749 lines (741 lines extracted, 30% reduction)
    • Ready for Week 6: Safe to proceed with group_by_builder.rs extraction
  • plan_builder.rs Phase 2 Week 4 Complete: select_builder.rs extraction finished, modular architecture expanded

    • select_builder.rs fully implemented: Complete extraction of extract_select_items() function and all helper functions (950 lines)
    • Trait-based delegation: SelectBuilder trait with extract_select_items method for clean separation
    • Modular architecture expanded: Clean separation between plan_builder.rs and select_builder.rs with proper imports
    • Compilation successful: All imports resolved, no compilation errors, functionality preserved through trait delegation
    • Code quality maintained: Comprehensive documentation, error handling, and performance characteristics
    • plan_builder.rs reduced: From ~8,300 to ~7,350 lines (950 lines extracted)
    • Ready for Week 5: Safe to proceed with from_builder.rs extraction
  • plan_builder.rs Phase 2 Week 3 Complete: join_builder.rs extraction finished, modular architecture achieved

    • join_builder.rs fully implemented: Complete extraction of extract_joins() function and all helper functions (1,200 lines)
    • Trait-based delegation: JoinBuilder trait with extract_joins and extract_array_join methods for clean separation
    • Modular architecture achieved: Clean separation between plan_builder.rs and join_builder.rs with proper imports
    • Compilation successful: All imports resolved, no compilation errors, functionality preserved through trait delegation
    • Code quality maintained: Comprehensive documentation, error handling, and performance characteristics
    • plan_builder.rs reduced: From 9,504 to ~8,300 lines (1,200 lines extracted)
    • Ready for Week 4: Safe to proceed with select_builder.rs extraction
  • plan_builder.rs Phase 2 Week 2.5 Setup Complete: Infrastructure ready for 7-week module extraction process

    • Performance baselines established: 5 query types benchmarked with results saved to benchmarks/plan_builder_baseline.json
    • Feature flags integrated: PlanBuilderFeatureFlags struct with 8 flags for controlling extraction phases
    • Test matrix documented: Comprehensive validation criteria in docs/development/phase2-test-matrix.md
    • Schema loading verified: Test environment working with corrected test_integration.yaml (fixed id_column vs node_id issue)
    • Rollback procedures validated: Feature flags allow graceful fallback when extraction phases are disabled
    • Ready for Week 3: Safe to proceed with join_builder.rs extraction (1,200 lines planned)
  • plan_builder_utils.rs Consolidation Complete: Eliminated duplicate alias utility functions across codebase

    • 8 duplicate functions removed from plan_builder_utils.rs (202 lines saved)
    • Single source of truth established in utils/alias_utils.rs
    • Functions consolidated: collect_aliases_from_plan, collect_inner_scope_aliases, cond_references_alias, find_cte_reference_alias, find_label_for_alias, get_anchor_alias_from_plan, operator_references_alias, strip_database_prefix
    • Critical bug fix: Resolved stack overflow in complex WITH+aggregation queries by fixing has_with_clause_in_graph_rel to handle unknown plan types (Discriminant(7))
    • Codebase impact: Reduced from 18,121 to 17,919 lines (-202 lines, -1.1%)
    • Testing verified: 770/780 Rust unit tests pass (98.7%), integration tests pass for core functionality
    • No functional regressions: WITH clause processing, aggregations, basic queries, and OPTIONAL MATCH all working correctly
  • Expression Utilities Consolidation Complete: Eliminated duplicate string processing functions across render_plan modules

    • New shared module created: src/render_plan/expression_utils.rs with common string literal and operand processing utilities
    • 3 duplicate functions removed from plan_builder_utils.rs, cte_generation.rs, and cte_extraction.rs (eliminated ~60 lines of duplication)
    • Functions consolidated: contains_string_literal, has_string_operand, flatten_addition_operands now in shared location
    • Public API established: Made extract_node_label_from_viewscan public in cte_extraction.rs for shared use by cte_generation.rs
    • Code quality improved: Single source of truth for expression processing utilities, reduced maintenance burden
    • Testing verified: All 770/770 unit tests passing (100%), no functional regressions
    • Architecture maintained: Clean separation of concerns while eliminating duplication

🚀 Features

  • CTE Unification Phase 3 Complete: Unified recursive CTE generation across all schema patterns with comprehensive test coverage
    • TraditionalCteStrategy: Standard node/edge table patterns
    • DenormalizedCteStrategy: Single-table denormalized schemas
    • FkEdgeCteStrategy: Hierarchical FK relationships
    • MixedAccessCteStrategy: Hybrid embedded/JOIN access patterns
    • EdgeToEdgeCteStrategy: Multi-hop denormalized edge-to-edge patterns
    • CoupledCteStrategy: Coupled edges in same physical row
  • Parameter Extraction Complete: All CTE strategies now properly extract parameters from WHERE clause filters for SQL parameterization

[0.6.1] - 2026-01-13

🚀 Features

  • Neo4j-compatible field aliases: RETURN clause now preserves exact expression text as field names when AS alias not specified (matches Neo4j behavior)

  • Integrate data_security schema, remove benchmark schemas from unified tests

  • Auto-load all test schemas at session start

  • Add PatternGraphMetadata POC for cleaner join inference evolution

  • Phase 1 - Use cached node references from PatternGraphMetadata

  • (graph_join_inference) Phase 2 - Simplified cross-branch detection using metadata

  • (graph_join_inference) Phase 4 - Add relationship uniqueness constraints

  • Complete fixed-length path inline JOIN optimization

  • Property pruning optimization with unified test infrastructure

  • Edge constraints for cross-node validation (8/8 tests passing)

  • Pattern Comprehensions and Multiple UNWIND support

  • Add multi-schema YAML support for loading multiple graph schemas

  • Add multi-schema database setup and test scripts

  • Add array subscript syntax support and complete multi-type VLP path functions

  • Make MAX_INFERRED_TYPES configurable via query parameter

🐛 Bug Fixes

  • Support anonymous nodes in graph patterns
  • Use node ID columns for VLP CTE generation
  • Optimize JOIN generation based on property usage, not node naming
  • Optimize JOIN generation based on property usage, not node naming
  • Permanently fix test infrastructure issues
  • Add filesystem and group membership test data to setup script
  • Add small-scale benchmark test data and cleanup obsolete scripts
  • Migrate from schema_name='default' to USE clause convention
  • Add missing matrix test schemas and USE clause support
  • Add USE clause to multi-hop pattern tests
  • Update social_polymorphic schema to use actual table names
  • Resolve ontime schema name conflict, add benchmark schemas back for matrix tests
  • Add flights to default db for ontime_benchmark - Copy flights to default database - Comprehensive matrix: +256 tests - Overall: +186 tests to 2947 - Session total: +1047 tests (+55 percent)
  • Restore ontime_flights schema name for pattern matrix tests - Revert ontime_denormalized back to ontime_flights - Remove ontime_benchmark from unified test loading - Update matrix conftest to use ontime_flights - Pattern schema matrix: 0/51 to 9/51 recovery - Overall: 2758 to 2958 (+200 tests) - Session: 1900 to 2958 (+1058 tests, +55.7 percent, 85.2 percent pass rate)
  • Add property_expressions schema to test loading - Fix database to default where tables actually exist - Replace CASE WHEN with if() for parsing compatibility - Add to load_test_schemas.py - Property expressions tests: 0/28 to 13/28 recovery - Overall: 2958 to 2976 (+18 tests) - Session: 1900 to 2976 (+1076 tests, +56.6 percent, 85.7 percent pass rate)
  • Add schema_name to role-based query tests - Role tests now use unified_test_schema - All 5 role-based tests now pass
  • Add missing property aliases to property_expressions schema
  • VLP cross-branch JOIN uses node alias instead of relationship alias
  • VLP transitivity check handles polymorphic relationships
  • All integration tests now passing or properly marked xfail
  • Add relationship labels to edge list test GraphRel structures
  • Update edge list test assertions for SingleTableScan optimization
  • Add proper GraphSchema to failing tests
  • Thread schema through single-hop query pipeline for edge constraints
  • (vlp) Fix denormalized VLP node ID selection (Dec 22 regression)
  • (vlp) Complete denormalized VLP with comprehensive fixes
  • VLP path functions in WITH clauses + CTE body rewriting
  • Remove escaped quotes and multi_schema loader entry from conftest
  • Load denormalized_flights_test schema with proper data
  • VLP WHERE clause alias resolution for denormalized schemas
  • Correct AUTHORED relationship schema in unified_test_multi_schema.yaml
  • Multi-type VLP architectural fix - FROM alias solves all mapping issues
  • Multi-type VLP JSON extraction - skip alias mapping for multi-type CTEs
  • FK-edge zero-length VLP edge tuple generation
  • Unify MAX_INFERRED_TYPES default to 5 for consistency
  • Parameterized views apply to both node and edge tables in VLP queries
  • Add anyLast() wrapping for CTE references in GROUP BY aggregations
  • Rewrite CTE column references in JOINs
  • VLP+WITH+MATCH pattern (ic9) - delegate to input.extract_joins() for CTE references
  • Add VLP endpoint detection in find_id_column_for_alias
  • Correct ontime_denormalized schema to use default database
  • Skip JOINs for fully denormalized VLP patterns
  • Map denormalized VLP endpoint aliases to CTE alias for rewriting
  • Consecutive MATCH with per-MATCH WHERE, comment support, scalar aggregate investigation
  • WITH expression scope - rewrite CASE expressions to use CTE columns

💼 Other

  • Comprehensive test failure categorization (507 failures)
  • V0.6.1 - WITH clause fixes, GraphRAG enhancements, LDBC progress
  • Update Cargo.lock for v0.6.1 release

🚜 Refactor

  • (graph_join_inference) Phase 3 - Break up infer_graph_join() god method
  • [breaking] Migrate all integration tests to multi-schema format
  • [breaking] Remove obsolete unified_test_schema and cleanup
  • Consolidate denormalized_flights schema references

📚 Documentation

  • Update README.md with v0.6.0 and accumulated features
  • Update KNOWN_ISSUES.md with v0.6.0 fixes
  • Archive wiki for v0.6.0 release
  • Add release notes for v0.6.0
  • Fix ClickHouse function prefix (ch./chagg. not clickhouse.)
  • Fix composite node ID example (use nodes not edges)
  • Update STATUS and investigation plan with anonymous node fix
  • Update STATUS with property usage optimization and current test status
  • Complete test infrastructure documentation
  • Update STATUS with schema loading fix
  • Update STATUS - ALL INTEGRATION TESTS PASSING! 🎉
  • Add comprehensive architecture analysis for Scan/ViewScan/GraphNode relationships
  • Update gap analysis - Gap #2 already implemented
  • Add schema testing requirements (VLP multi-schema mandate)
  • Add VLP denormalized property handling TODO
  • Add session findings and feature analysis
  • Clean up KNOWN_ISSUES.md and add path function limitation
  • Update CHANGELOG and test infrastructure for VLP fixes
  • Add multi-schema configuration documentation
  • Add multi-schema setup guide
  • Update TESTING.md for multi-schema architecture
  • Update STATUS.md - remove load_test_schemas.py reference
  • Add VS Code terminal freeze prevention to TESTING.md
  • Document VLP WHERE clause bug discovery
  • Update Cypher-Subgraph-Extraction.md with verified pattern support matrix
  • Document max_inferred_types feature and update default to 5
  • Update STATUS with LDBC progress and IC-9 CTE naming issue
  • Systematic documentation cleanup and reorganization
  • Streamline STATUS.md to focus on current state (2822 → 322 lines)
  • LDBC benchmark baseline testing and analysis
  • Update README test coverage to 3000+ tests and reorganize features
  • Archive wiki documentation for v0.6.1 release

🧪 Testing

  • Update test expectations for known limitations
  • Add error message verification for known limitations
  • (graph_join_inference) Add comprehensive unit tests for Phase 4 uniqueness constraints
  • Add comprehensive VLP cross-functional testing
  • Add comprehensive GraphRAG schema variation tests
  • Add zero-length VLP tests for [*0..] and [*0..N] patterns

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Add lineage test schema and cleanup temporary files
  • Move SCHEMA_THREADING_ARCHITECTURE.md to docs/development/
  • Ignore docs1 directory in gitignore
  • Clean up docs
  • More doc cleanup
  • More docs clean up, README
  • Remove unused Flight node from unified_test_schema.yaml
  • Update CHANGELOG.md [skip ci]

[0.6.0] - 2025-12-22

🚀 Features

  • (functions) Add 18 new Neo4j function mappings for v0.5.5
  • (functions) Add 30 more Neo4j function mappings for v0.5.5
  • (functions) Add ClickHouse function pass-through via ch:: prefix
  • (functions) Add ClickHouse aggregate function pass-through via ch. prefix
  • (functions) Add chagg. prefix for explicit aggregates, expand aggregate registry to ~150 functions
  • (benchmark) Add LDBC SNB Interactive v1 benchmark
  • (benchmark) Add ClickGraph schema matching datagen format
  • (benchmark) Add LDBC query test script
  • (ldbc) Achieve 100% LDBC BI benchmark (26/26 queries)
  • Implement chained WITH clause support with CTE generation
  • Support ORDER BY, SKIP, LIMIT after WITH clause
  • Implement size() on patterns with schema-aware ID lookup
  • Add composite node ID infrastructure for multi-column primary keys
  • Add CTE reference validation
  • CTE-aware variable resolution for WITH clauses
  • Fix CTE column filtering and JOIN condition rewriting for WITH clauses
  • CTE-aware variable resolution + WITH validation + documentation improvements
  • Add lambda expression support for ClickHouse passthrough functions
  • Add comprehensive LDBC benchmark suite with loading, query, and concurrency tests
  • Implement scope-based variable resolution in analyzer (Phase 1)
  • Remove dead CTE validation functions
  • Implement CTE column resolution across all join strategies
  • Remove obsolete JOIN rewriting code from renderer (Phase 3D-A)
  • Move CTE column resolution to analyzer (Phase 3D-B)
  • Pre-compute projected columns in analyzer (Phase 3E)
  • Add CTE schema registry for analyzer (Phase 3F)
  • Use pre-computed projected_columns in renderer (Phase 3E-B)
  • Implement cross-branch shared node JOIN detection
  • Allow disconnected comma patterns with WHERE clause predicates
  • Support multiple sequential MATCH clauses
  • Implement generic CTE JOIN generation using correlation predicates
  • Complete LDBC SNB schema and data loading infrastructure
  • Improve relationship validation error messages
  • Clarify node_id semantics as property names with auto-identity mappings
  • Complete composite node_id support (Phase 2)
  • Add polymorphic relationship resolution architecture
  • Complete polymorphic relationship resolution data flow
  • Fix polymorphic relationship resolution in CTE generation
  • Add Comment REPLY_OF Message schema definition
  • Add schema entity collection in VariableResolver for Projection scope
  • Add dedicated LabelInference analyzer pass
  • Enhance TypeInference to infer both node labels and edge types
  • Reduce MAX_INFERRED_TYPES from 20 to 5
  • (parser) Add clear error messages for unsupported pattern comprehensions
  • (parser) Add clear error messages for bidirectional relationship patterns
  • (parser) Convert temporal property accessors to function calls
  • (analyzer) Add UNWIND variable scope handling to variable_resolver
  • (analyzer) Add type inference for UNWIND elements from collect() expressions
  • Support path variables in comma-separated MATCH patterns
  • Add polymorphic relationship resolution with node types
  • Complete collect(node) + UNWIND tuple mapping & metadata preservation architecture
  • Make CLICKHOUSE_DATABASE optional with 'default' fallback
  • Add parser support for != (NotEqual) operator
  • Add unified test schema for streamlined testing
  • Add unified test data setup and fix matrix test schema issues
  • Complete multi-tenant parameterized view support
  • Add denormalized flights schema to unified test schema
  • Add VLP transitivity check to prevent invalid recursive patterns

🐛 Bug Fixes

  • (benchmark) Use Docker-based LDBC data generation
  • (benchmark) Align DDL with actual datagen output format
  • (benchmark) Add ClickHouse credentials support
  • (benchmark) Align DDL and schema with actual datagen output
  • (ldbc) Fix CTE pattern for WITH + table alias pass-through
  • (ldbc) Fix ic3 relationship name POST_IS_LOCATED_IN -> POST_LOCATED_IN
  • WITH+MATCH CTE generation for correct SQL context
  • Replace all silent defaults with explicit errors in render_expr.rs
  • Eliminate ViewScan silent defaults - require explicit relationship columns
  • Expand WITH TableAlias to all columns for aggregation queries
  • Track CTE schemas to build proper property_mapping for references
  • Remove CTE validation to enable nested WITH clauses
  • Prevent duplicate CTE generation in multi-level WITH queries
  • Three-level WITH nesting with correct CTE scope resolution
  • Add proper schemas to WITH/HAVING tests
  • Correct CTE naming convention to use all exported aliases
  • Coupled edge alias resolution for multiple edges in same table
  • Rewrite expressions in intermediate CTEs to fix 4-level WITH queries
  • Add GROUP BY and ORDER BY expression rewriting for final queries
  • Issue #6 - Fix Comma Pattern and NOT operator bugs
  • Resolve 3 critical LDBC query blocking issues
  • (ldbc) Inline property matching & semantic relationship expansion
  • (ldbc) Handle IS NULL checks on relationship wildcards (IS7)
  • (ldbc) Fix size() pattern comprehensions - handle internal variables correctly (BI8)
  • (ldbc) Rewrite path functions in WITH clause (IC1)
  • Strip database prefixes from CTE names for ClickHouse compatibility
  • Cartesian Product WITH clause missing JOIN ON
  • Operator precedence in expression parser
  • VLP endpoint JOINs with alias rewriting for chained patterns
  • Correct NOT operator precedence and remove hardcoded table fallbacks
  • Three critical shortestPath and query execution bugs
  • Extend VLP alias rewriting to WHERE clauses for IC1 support
  • Use correct CTE names for multi-variant relationship JOINs
  • Remove database prefix from CTE table names in cross-branch JOINs
  • Hoist trailing non-recursive CTEs to prevent nesting scope issues
  • VLP + WITH label corruption bug - use node labels in RelationshipSchema
  • Resolve compilation errors from AST and GraphRel changes
  • Add fallback to lookup table names from relationship schema
  • Complete RelationshipSchema refactoring - all 646 tests passing
  • Add database prefixes to base table JOINs
  • Use underscore convention for CTE column aliases
  • Thread node labels through relationship lookup pipeline for polymorphic relationships
  • Support filtered node views in relationship validation
  • Add JOIN dependency sorting to CTE generation path
  • Use existing TableCtx labels in multi-pattern MATCH label inference
  • TypeInference creates ViewScan for inferred node labels
  • QueryValidation respects parser normalization
  • Populate from_id/to_id columns during JOIN creation for correct NULL checks
  • (ldbc) Align BI queries with LDBC schema definitions
  • Prevent RefCell panic in populate_relationship_columns_from_plan
  • UNWIND after WITH now uses CTE as FROM table instead of system.one
  • Replace all panic!() with log::error!() - PREVENT SERVER CRASHES
  • Clean up unit tests - fix 21 compilation errors
  • Complete unit test cleanup - fix assertions and mark unimplemented features
  • Replace non-standard LIKE syntax with proper OpenCypher string predicates
  • Add != operator support to comparison expression parser
  • Preserve database prefix in ViewTableRef SQL generation
  • Relationship variable expansion + consolidate property helpers
  • Use relationship alias for denormalized edge FROM clause
  • Re-enable selective cross-branch JOIN for comma-separated patterns
  • Rel_type_index to prefer composite keys over simple keys
  • WITH...MATCH pattern using wrong table for FROM clause
  • Update test labels to match unified_test_schema
  • Test_multi_database.py - use schema_name instead of database for USE clause
  • Unify aggregation logic and fix multi-schema support
  • Multi-table label bug fixes and error handling improvements

💼 Other

  • Fix dependency vulnerabilities for v0.5.5
  • Partial fix for nested WITH clauses - add recursive handling
  • Multi-variant CTE column name resolution in JOIN conditions
  • SchemaInference using table names instead of node labels

🚜 Refactor

  • Fix compiler warnings and clean up unused variables
  • (functions) Change ch:: to ch. prefix for Neo4j ecosystem compatibility
  • Extract TableAlias expansion into helper functions
  • Replace wildcard expansion in build_with_aggregation_match_cte_plan with helper
  • Remove deprecated v1 graph pattern handler (1,568 lines)
  • Extract CTE hoisting helper function
  • Remove unused ProjectionKind::With enum variant
  • Remove 676 lines of dead WITH clause handling code
  • Remove 47 lines of dead GraphNode branch with empty property_mapping
  • Remove redundant variable resolution from renderer (Phase 3A)
  • Remove unused bidirectional and FK-edge functions
  • Remove dead code function find_cte_in_plan
  • Consolidate duplicate property extraction code (-23 lines)
  • Remove dead extract_ctes() function (-301 lines)
  • Separate graph labels from table names in RelationshipSchema
  • Remove redundant WithScopeSplitter analyzer pass
  • Remove old parsing-time label inference
  • Consolidate inference logic into TypeInference with polymorphic support
  • Replace hardcoded fallbacks with descriptive errors
  • Add strict validation for system.one usage in UNWIND
  • ELIMINATE ALL HARDCODED FALLBACKS - fail fast instead
  • Consolidate test data setup - use MergeTree, remove duplicates

📚 Documentation

  • Update wiki documentation for v0.5.4 release
  • Archive wiki for v0.5.4 release
  • Add UNWIND clause documentation to wiki
  • Update v0.5.4 wiki snapshot with UNWIND documentation
  • Update Known-Limitations with recently implemented features
  • Update v0.5.4 wiki snapshot with corrected feature status
  • Add 30 new functions to Cypher-Functions.md reference
  • Expand vector similarity section with RAG usage
  • Clarify scalar vs aggregate function categories in ch.* docs
  • Add lambda expression limitation to ch.* pass-through documentation
  • Split ClickHouse pass-through into dedicated doc for better discoverability
  • Add comparison with PuppyGraph, TigerGraph, NebulaGraph
  • Fix PuppyGraph architecture description
  • Fix license - Apache 2.0, not MIT
  • (benchmark) Update README with correct workflow and files
  • Update KNOWN_ISSUES with accurate LDBC benchmark status
  • Update STATUS.md and KNOWN_ISSUES.md for WITH clause improvements
  • Add size() documentation and replace silent defaults with errors
  • Document composite node ID feature
  • Update STATUS.md with IC-1 fix and 100% LDBC benchmark
  • Document WITH handler refactoring (120 lines eliminated)
  • Identify remaining code quality hotspots after WITH refactoring
  • Update STATUS and code quality analysis with v1 removal
  • Add quality improvement plan and clarify parameter limitation
  • Add comprehensive lambda expression documentation to Cypher Language Reference
  • Reorganize lambda expressions as subsection of ClickHouse Function Passthrough
  • Move lambda expressions details to ClickHouse-Functions.md
  • Update LDBC benchmark analysis with accurate coverage (94% actionable)
  • Add comprehensive LDBC data loading and persistence guide
  • Add benchmark infrastructure completion summary
  • Add benchmark quick reference card
  • Update STATUS and CHANGELOG with predicate correlation
  • Update STATUS and CHANGELOG for sequential MATCH support
  • Update CHANGELOG and KNOWN_ISSUES for Issue #2 fix
  • Update KNOWN_ISSUES - mark Issues #1, #3, #4 as FIXED
  • Verify and update KNOWN_ISSUES - mark #5, #7 FIXED, detail #6 bugs
  • Update KNOWN_ISSUES.md - Mark Issue #6 as FIXED
  • Add LDBC benchmark audit tools and issue tracking
  • Update STATUS.md with WHERE clause rewriting completion
  • Document CTE database prefix fix in STATUS.md
  • Add AI Assistant Integration via MCP Protocol
  • Update STATUS.md with RelationshipSchema refactoring progress
  • Update STATUS.md - RelationshipSchema refactoring complete (646/646 tests)
  • Update STATUS and planning docs for node_id semantic clarification
  • Update STATUS.md and KNOWN_ISSUES.md for database prefix fix
  • Add database prefix fix to CHANGELOG.md
  • Update QUERY_FIX_TRACKER with Dec 19 fixes
  • Update STATUS, CHANGELOG, KNOWN_ISSUES for polymorphic relationship fix
  • Update STATUS with polymorphic resolution progress
  • Update STATUS.md with session summary
  • Update STATUS with TypeInference ViewScan fix
  • Update STATUS with QueryValidation fix - 70% LDBC passing
  • Update CHANGELOG with Dec 19 achievements and cleanup root directory
  • Analyze LDBC failures - 70% pass rate, identify 3 root causes
  • Add LDBC benchmark configuration guide
  • Correct bi-8/bi-14 root cause - pattern comprehensions not implemented
  • Update KNOWN_ISSUES with parser improvements for pattern comprehensions
  • Clarify CASE expression status - fully implemented
  • Update all documentation with correct schema paths
  • Add systematic test failure investigation plan
  • Update STATUS and CHANGELOG with test infrastructure progress
  • Mark relationship variable return bug as fixed
  • Update STATUS and CHANGELOG for 24/24 zeek tests
  • Update STATUS and CHANGELOG with test label fixes
  • Document path function VLP alias bug in KNOWN_ISSUES

⚡ Performance

  • Replace UUID-based CTE names with sequential counters

🎨 Styling

  • Apply rustfmt formatting to entire codebase

🧪 Testing

  • Update standalone relationship test for v2 behavior
  • Add comprehensive WITH + advanced features test suite
  • Add parameter tests for WITH clause combinations
  • Add LDBC benchmark test scripts
  • Add missing LDBC query parameters to audit script

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Remove dead code and fix all compiler warnings
  • Hide internal documentation from public repo
  • Keep wiki, images, and features subdirs external
  • Remove internal documentation from repo
  • Remove copilot instructions from public repo
  • Remove debug output after nested CTE fix
  • Add *.log to gitignore to prevent log file commits
  • Comprehensive cleanup - standardize schemas and reorganize tests
  • Remove duplicate setup_all_test_data.sh in scripts/setup/
  • Release v0.6.0 - VLP transitivity check and bug fixes

[0.5.4] - 2025-12-08

🚀 Features

  • Add native support for self-referencing FK pattern
  • Add relationship uniqueness enforcement for undirected patterns
  • (schema) Add fixed-endpoint polymorphic edge support
  • (union) Add UNION and UNION ALL query support
  • Multi-table label support and denormalized schema improvements
  • (pattern_schema) Add unified PatternSchemaContext abstraction - Phase 1
  • (graph_join_inference) Integrate PatternSchemaContext - Phase 2
  • (graph_join_inference) Add handle_graph_pattern_v2 - Phase 3
  • (pattern_schema) Add FkEdgeJoin strategy for FK-edge patterns
  • (graph_join) Wire up handle_graph_pattern_v2 with USE_PATTERN_SCHEMA_V2 env toggle

🐛 Bug Fixes

  • GROUP BY expansion and count(DISTINCT r) for denormalized schemas
  • Undirected multi-hop patterns generate correct SQL
  • Support fixed-endpoint polymorphic edges without type_column
  • Correct polymorphic filter condition in graph_join_inference
  • Normalize GraphRel left/right semantics for consistent JOIN generation
  • Recurse into nested GraphRels for VLP detection
  • (render_plan) Add WHERE filters for VLP chained pattern endpoints (Issue #5)
  • (parser) Reject binary operators (AND/OR/XOR) as variable names
  • Multi-hop anonymous patterns, OPTIONAL MATCH polymorphic, string operators
  • Aggregation and UNWIND bugs
  • Denormalized schema query pattern fixes (TODO-1, TODO-2, TODO-4)
  • Cross-table WITH correlation now generates proper JOINs (TODO-3)
  • WITH clause alias propagation through GraphJoins wrapper (TODO-8)
  • Multi-hop denormalized edge JOIN generation
  • Update schema files to match test data columns
  • (pattern_schema) Pass prev_edge_info for multi-hop detection in v2 path
  • (filter_tagging) Correct owning edge detection for multi-hop intermediate nodes
  • FK-edge JOIN direction bug - use join_side instead of fk_on_right
  • Add polymorphic label filter generation for edges

🚜 Refactor

  • Unify FK-edge pattern for self-ref and non-self-ref cases
  • Minor code cleanup in bidirectional_union and plan_builder_helpers
  • Make PatternSchemaContext (v2) the default join inference path
  • Reorganize benchmarks into individual directories
  • Replace NodeIdSchema.column with Identifier-based id field
  • Change YAML field id_column to node_id for consistency
  • Extract predicate analysis helpers to plan_builder_helpers.rs
  • Extract JOIN and filter helpers to plan_builder_helpers.rs

📚 Documentation

  • Update README for v0.5.3 release
  • Add fixed-endpoint polymorphic edge documentation
  • Add VLP+chained patterns docs and private security tests
  • Document Issue #5 (WHERE filter on VLP chained endpoints)
  • (readme) Minor wording improvements
  • Update PLANNING_v0.5.3 and CHANGELOG with bug fix status
  • Add unified schema abstraction proposal and test scripts
  • Add unified schema abstraction Phase 4 completion to STATUS
  • Update unified schema abstraction progress - Phase 4 fully complete
  • (benchmarks) Add ClickHouse env vars and fix paths in README
  • (benchmarks) Streamline README to be a concise index
  • Archive PLANNING_v0.5.3.md - all bugs resolved

🧪 Testing

  • Add multi-hop pattern integration tests
  • Fix Zeek integration tests - response format and skip cross-table tests
  • Add v1 vs v2 comparison test script
  • Add unit tests for predicate analysis helpers

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Make test files use CLICKGRAPH_URL env var for port flexibility
  • (benchmarks) Move social_network-specific files to subdirectory

[0.5.3] - 2025-12-02

🚀 Features

  • Add regex match (=~) operator and fix collect() function
  • Add EXISTS subquery and WITH+MATCH chaining support
  • Add label() function for scalar label return

🐛 Bug Fixes

  • Remove unused schemas volume from docker-compose
  • Parser now rejects invalid syntax with unparsed input
  • Column alias for type(), id(), labels() graph introspection functions
  • Update release workflow to use clickgraph binary name
  • Update release workflow to use clickgraph-client binary name
  • Build entire workspace in release workflow

📚 Documentation

  • Archive wiki for v0.5.2 release
  • Fix schema documentation and shorten README
  • Fix Quick Start to include required GRAPH_CONFIG_PATH
  • Add 3 new known issues from ontime schema testing
  • Update KNOWN_ISSUES.md - WHERE AND now caught
  • Clean up KNOWN_ISSUES.md - remove resolved issues
  • Remove false known limitations - all verified working

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Release v0.5.3
  • Update CHANGELOG.md [skip ci]
  • Update Cargo.lock for v0.5.3
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]

[0.5.2] - 2025-11-30

🚀 Features

  • Add docker-compose.dev.yaml for development
  • [breaking] Phase 1 - Fixed-length paths use inline JOINs instead of CTEs
  • Add cycle prevention for fixed-length paths
  • Restore PropertyValue and denormalized support from stash, integrate with anchor_table
  • Complete denormalized query support with alias remapping and WHERE clause filtering
  • Implement denormalized node-only queries with UNION ALL
  • Support RETURN DISTINCT for denormalized node-only queries
  • Support ORDER BY for denormalized UNION queries
  • Fix UNION ALL aggregation semantics for denormalized node queries
  • Variable-length paths for denormalized edge tables
  • Add schema-level filter field with SQL predicate parsing
  • Schema-level filters and OPTIONAL MATCH LEFT JOIN fix
  • Add VLP + UNWIND support with ARRAY JOIN generation
  • Implement coupled edge alias unification for denormalized patterns
  • Implement polymorphic edge query support
  • (polymorphic) Add VLP polymorphic edge filter support
  • (polymorphic) Add IN clause support for multiple relationship types in single-hop
  • Complete polymorphic edge support for wildcard relationship patterns
  • Add edge inline property filter tests and update documentation
  • Implement bidirectional pattern UNION ALL transformation

🐛 Bug Fixes

  • ORDER BY rewrite bug for chained JOIN CTEs
  • Zero-hop variable-length path support
  • Remove ChainedJoinGenerator CTE for fixed-length paths
  • Complete PropertyValue type conversions in plan_builder.rs
  • Revert table alias remapping in filter_tagging to preserve filter context
  • Eliminate duplicate WHERE filters by optimizing FilterIntoGraphRel
  • Correct JOIN order and FROM table selection for mixed property expressions
  • Ensure variable-length and shortest path queries use CTE path
  • Destination node properties now map to correct columns in denormalized edge tables
  • Multi-hop denormalized edge patterns and duplicate WHERE filters
  • Variable-length path schema resolution for denormalized edges
  • Add edge_id support to RelationshipDefinition for cycle prevention
  • Fixed-length VLP (*1, *2, *3) now generates inline JOINs
  • Fixed-length VLP (*2, *3) now works correctly
  • Denormalized schema VLP property alias resolution
  • VLP recursive CTE min_hops filtering and aggregation handling
  • OPTIONAL MATCH + VLP returns anchor when no path exists
  • RETURN r and graph functions (type, id, labels)
  • Support inline property filters with numeric literals
  • Push projections into Union branches for bidirectional patterns
  • Polymorphic multi-type JOIN filter now uses IN clause

💼 Other

  • Manual addition of denormalized fields (incomplete)

🚜 Refactor

  • Simplify ORDER BY logic for inline JOINs
  • Simplify GraphJoins FROM clause logic - use relationship table when no joins exist
  • Store anchor table in GraphJoins, eliminate redundant find_anchor_node() calls
  • Set is_denormalized flag directly in analyzer, remove redundant optimizer pass
  • Move helper functions from plan_builder.rs to plan_builder_helpers.rs
  • Rename co-located → coupled edges terminology
  • Consolidate schema loading with shared helpers
  • Consolidated VLP handling with VlpSchemaType

📚 Documentation

  • Prioritize Docker Hub image in getting-started guide
  • Update README with v0.5.1 Docker Hub release
  • Add v0.5.2 planning document
  • Update wiki Quick Start to use Docker Hub image with credentials
  • Add Zeek network log examples and denormalized edge table guide
  • Update STATUS.md with denormalized single-hop fix
  • Update denormalized blocker notes with current status
  • Update denormalized edge status to COMPLETE
  • Add graph algorithm support to denormalized edge docs
  • Add 0-hop pattern support to denormalized edge docs
  • (wiki) Update denormalized properties with all supported patterns
  • Add coupled edges documentation
  • (wiki) Add Coupled Edges section to denormalized properties
  • Add v0.5.2 TODO list for polymorphic edges and code consolidation
  • Mark schema loading consolidation complete in TODO
  • Update STATUS.md with polymorphic edge filter completion
  • Add Schema-Basics.md and wiki versioning workflow
  • Update documentation for v0.5.2 schema variations
  • Update KNOWN_ISSUES.md with v0.5.2 status
  • Update KNOWN_ISSUES.md with fixed-length VLP resolution
  • Update KNOWN_ISSUES with VLP fixes and *0 pattern limitation
  • Add Cypher Subgraph Extraction wiki with Nebula GET SUBGRAPH comparison
  • Update README with v0.5.2 features

🎨 Styling

  • Use UNION instead of UNION DISTINCT

🧪 Testing

  • Add comprehensive Docker image validation suite
  • Add comprehensive schema variation test suite (73 tests)

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Clean up root directory - remove temp files and organize Python tests
  • Release v0.5.2
  • Update CHANGELOG.md [skip ci]
  • Update Cargo.lock for v0.5.2

[0.5.1] - 2025-11-21

🚀 Features

  • Add SQL Generation API (v0.5.1)
  • Implement RETURN DISTINCT for de-duplication
  • Add role-based connection pool for ClickHouse RBAC

🐛 Bug Fixes

  • Eliminate flaky cache LRU eviction test with millisecond timestamps
  • Replace docker_publish.yaml with docker-publish.yml
  • Add missing distinct field to all Projection initializations

📚 Documentation

  • Fix getting-started guide issues
  • Update STATUS.md with fixed flaky test achievement (423/423 passing)
  • Add /query/sql endpoint and RETURN DISTINCT documentation
  • Add /query/sql endpoint and RETURN DISTINCT to wiki

🧪 Testing

  • Add role-based connection pool integration tests

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Release v0.5.1
  • Update CHANGELOG.md [skip ci]

[0.5.0] - 2025-11-19

🚀 Features

  • (phase2) Add tenant_id and view_parameters to request context
  • (phase2) Thread tenant_id through HTTP/Bolt to query planner
  • Implement SET ROLE RBAC support for single-tenant deployments
  • (multi-tenancy) Add view_parameters field to schema config
  • (multi-tenancy) Implement parameterized view SQL generation
  • (multi-tenancy) Add Bolt protocol view_parameters extraction
  • (phase2) Add engine detection for FINAL keyword support
  • (phase2) Add use_final field to schema configuration
  • (phase2) Add FINAL keyword support to SQL generation
  • (phase2) Auto-schema discovery with column auto-detection
  • (auto-discovery) Add camelCase naming convention support
  • Add PowerShell scripts for wiki validation workflow
  • Add Helm chart for Kubernetes deployment

🐛 Bug Fixes

  • (phase2) Correct FINAL keyword placement - after alias
  • (tests) Add missing engine and use_final fields to test schemas
  • Implement property expansion for RETURN whole node queries
  • Update clickgraph-client and add documentation

🚜 Refactor

  • Minor code improvements in parser and planner

📚 Documentation

  • Phase 2 minimal RBAC - parameterized views with multi-parameter support
  • Fix Pattern 2 RBAC examples to use SET ROLE approach
  • Add Phase 2 progress to STATUS.md
  • Add comprehensive Phase 2 multi-tenancy status report
  • (multi-tenancy) Complete parameterized views documentation + cleanup
  • Update parameterized views note with cache optimization details
  • (phase2) Complete Phase 2 multi-tenancy documentation and tests
  • Correct Phase 2 status - 2/5 complete, not fully done
  • Update ROADMAP.md Phase 2 progress - 2/5 complete
  • (phase2) Update STATUS and CHANGELOG for FINAL syntax fix
  • (phase2) Update STATUS and CHANGELOG for auto-schema discovery
  • Align wiki examples with benchmark schema and add validation
  • Add session documentation and planning notes
  • Update STATUS, CHANGELOG, and KNOWN_ISSUES
  • Update ROADMAP with wiki documentation and bug fix progress
  • Mark Phase 2 complete - v0.5.0 release ready!

⚡ Performance

  • (cache) Optimize multi-tenant caching with SQL placeholders

🧪 Testing

  • Add comprehensive SET ROLE RBAC test suite
  • (multi-tenancy) Add parameterized views test infrastructure
  • (multi-tenancy) Add unit tests for view_parameters
  • Add integration test utilities and schema

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Clean up temporary test output and debug files

[0.4.0] - 2025-11-15

🚀 Features

  • Add parameter support via HTTP API + identity fallback for properties
  • Add production-ready query cache with LRU eviction
  • Complete Bolt 5.8 protocol implementation with E2E tests passing
  • Add Neo4j function support with 25+ function mappings
  • Complete E2E testing infrastructure + critical bug fixes
  • Unified benchmark architecture with scale factor parameter
  • Adjust post ratio to 20 and add 2 post-related benchmark queries
  • Add MergeTree engine support for large-scale benchmarks
  • (benchmark) Complete MergeTree benchmark infrastructure, discover multi-hop query bug
  • Add comprehensive regression test suite (799 tests)
  • Add pre-flight checks to test runner
  • Pre-load test_integration schema at server startup
  • Implement undirected relationship support (Direction::Either)

🐛 Bug Fixes

  • Multi-hop JOINs, SELECT aliases, SQL quoting + improve benchmark display
  • Use correct schema and database for integration tests
  • Start server without pre-loaded schema for integration tests
  • IS NULL operator in CASE expressions (22/25 tests passing)
  • Resolve compilation errors from API changes and incomplete cleanup
  • Additional GraphSchema::build() signature fixes in test files
  • Remove unused variable in view_resolver_tests.rs
  • Update error handling tests to match actual ClickGraph behavior

🚜 Refactor

  • Archive NEXT_STEPS.md in favor of ROADMAP.md
  • Remove inherited DDL generation code (~1250 LOC)
  • Remove bitmap index infrastructure (~200 LOC)
  • Remove use_edge_list flag (~50 LOC)
  • Flatten directory structure - remove brahmand/ wrapper
  • Remove expression_utils dead code - visitor pattern + utility functions
  • Convert CteGenerationContext to immutable builder pattern
  • Create plan_builder_helpers module (preparatory step)
  • Integrate plan_builder_helpers module
  • Add deprecation markers to duplicate helper functions
  • Complete deprecation markers for all helper functions (20/20)
  • Remove all deprecated helper functions (~736 LOC, 22% reduction)
  • Replace file-based debug logging with standard log::debug! macro

📚 Documentation

  • Update KNOWN_ISSUES and copilot-instructions - all major issues resolved
  • Add comprehensive ROADMAP with real-world features and prioritization
  • Architecture decision - Use string substitution for parameters (not ClickHouse .bind())
  • Update NEXT_STEPS.md roadmap with query cache completion
  • Update README and ROADMAP with query cache completion
  • Highlight parameter support in README and add usage restrictions
  • Update ROADMAP.md with Bolt 5.8 completion
  • Clarify anonymous node/edge pattern as TODO feature
  • Document flaky cache LRU eviction test
  • Document anonymous node SQL generation bug
  • Change 'production-ready' to 'development-ready' for v0.4.0

🧪 Testing

  • (benchmark) Add regression test script for CI/CD

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Complete v0.4.0 release preparation - Phase 1 complete

[0.3.0] - 2025-11-10

🚀 Features

  • Complete WITH clause with GROUP BY, HAVING, and CTE support
  • Enable per-request schema support for thread-safe multi-tenant architecture
  • Add schema-aware helper functions in render layer

🐛 Bug Fixes

  • Multi-hop graph query planning and join generation
  • Update path variable tests to match tuple() implementation
  • Improve anchor node selection to prefer LEFT nodes first
  • Prevent double schema prefix in CTE table names
  • Use correct node alias for FROM clause in GraphRel fallback
  • Prevent both LEFT and RIGHT nodes from being marked as anchor
  • Remove duplicate JOINs for path variable queries
  • Detect multiple relationship types in GraphJoins tree
  • Update JOINs to use UNION CTE for multiple relationship types
  • Correct release date in README (November 9, not 23)

💼 Other

  • Add schema to PlanCtx (Phases 1-3 complete)

🚜 Refactor

  • Remove BITMAP traversal code and fix relationship direction handling
  • Rename handle_edge_list_traversal to handle_graph_pattern
  • Remove redundant GLOBAL_GRAPH_SCHEMA

📚 Documentation

  • Prepare for next session and organize repository
  • Python integration test status report (36.4% passing)
  • Update STATUS and KNOWN_ISSUES for GLOBAL_GRAPH_SCHEMA removal
  • Clean up outdated KNOWN_ISSUES and update README

🧪 Testing

  • Add debugging utilities for anchor node and JOIN issues

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]
  • Disable automatic docker publish
  • Clean up test debris and remove deleted optimizer
  • Replace emoji characters with text equivalents in test files
  • Organize root directory for public repo
  • Bump version to 0.2.0
  • Bump version to 0.3.0

[0.2.0] - 2025-11-06

🚀 Features

  • Implement dual-key schema registration for startup-loaded schemas
  • Add COUNT(DISTINCT node) support and fix integration test infrastructure
  • Support edge-driven queries with anonymous node patterns

🐛 Bug Fixes

  • Simplify schema strategy - use only server's default schema
  • Remove ALL hardcoded property mappings - CRITICAL BUG FIX
  • Enhance column name helpers to support both prefixed and unprefixed names
  • Remove is_simple_relationship logic that skipped node joins
  • Configure Docker to use integration test schema
  • Only create node JOINs when nodes are referenced in query
  • Preserve table aliases in WHERE clause filters
  • Extract where_predicate from GraphRel during filter extraction
  • Remove direction-based logic from JOIN inference - both directions now work
  • GraphNode uses its own alias for PropertyAccessExp, not hardcoded 'u'
  • Complete OPTIONAL MATCH with clean SQL generation
  • Add user_id and product_id to schema property_mappings
  • Add schema prefix to JOIN tables in cte_extraction.rs
  • Handle fully qualified table names in table_to_id_column
  • Variable-length paths now generate recursive CTEs
  • Multiple relationship types now generate UNION CTEs
  • Correct edge list test assertions for direction semantics

💼 Other

  • Document property mapping bug investigation

🚜 Refactor

  • Remove /api/ prefix from routes for simplicity

📚 Documentation

  • Final Phase 1 summary with all 12 test suites
  • Add schema loading architecture documentation and API test
  • Update STATUS with integration test results
  • Create action plan for property mapping bug fix
  • Update STATUS and CHANGELOG with critical bug fix resolution
  • Document WHERE clause gap for simple MATCH queries
  • Add schema management endpoints and update API references
  • Update STATUS.md with WHERE clause alias fix
  • Update STATUS with WHERE predicate extraction fix
  • Update STATUS and CHANGELOG with schema fix
  • Update STATUS with complete session summary

🧪 Testing

  • Add comprehensive integration test framework
  • Add comprehensive relationship traversal tests
  • Add variable-length path and shortest path integration tests
  • Add OPTIONAL MATCH and aggregation integration tests
  • Complete Phase 1 integration test suite with CASE, paths, and multi-database
  • Add comprehensive error handling integration tests
  • Add basic performance regression tests
  • Initial integration test suite run - 272 tests collected
  • Fix schema/database naming separation in integration tests

⚙️ Miscellaneous Tasks

  • Update CHANGELOG.md [skip ci]

[0.1.0] - 2025-11-02

🚀 Features

  • (parser) Add shortest path function parsing
  • (planner) Add ShortestPathMode tracking to GraphRel
  • (planner) Detect and propagate shortest path mode
  • (sql) Implement shortest path SQL generation with depth filtering
  • Add WHERE clause filtering support for shortest path queries
  • Add path variable support to parser (Phase 2.1-2.2)
  • Track path variables in logical plan (Phase 2.3)
  • Pass path variable to SQL generator (Phase 2.4)
  • Phase 2.5 - Generate path object SQL for path variables
  • Phase 2.6 - Implement path functions (length, nodes, relationships)
  • WHERE clause filters for variable-length paths and shortestPath
  • Complete allShortestPaths implementation with WHERE filters
  • Implement alternate relationship types [:TYPE1|TYPE2] support
  • Implement multiple relationship types with UNION logic
  • Support multiple relationship types with labels vector
  • Complete Path Variables & Functions implementation
  • Complete Path Variables implementation with documentation
  • Add PageRank algorithm support with CALL statement
  • Complete Query Performance Metrics implementation
  • Complete CASE expressions implementation with full context support
  • Complete WHERE clause filtering pipeline for variable-length paths
  • Implement type-safe configuration management
  • Systematic error handling improvements - replace panic-prone unwrap() calls
  • Complete codebase health restructuring - eliminate runtime panics
  • Rebrand from Brahmand to ClickGraph
  • Update benchmark suite for ClickGraph rebrand and improved performance testing
  • Complete multiple relationship types feature with schema resolution
  • Complete WHERE clause filters with schema-driven resolution
  • Add per-table database support in multi-schema architecture
  • Complete schema-only architecture migration
  • Add medium benchmark (10K users, 50K follows) with performance metrics
  • Add large benchmark (5M users, 50M follows) - 90% success at massive scale!
  • Add Bolt protocol multi-database support
  • Add test convenience wrapper and update TESTING_GUIDE
  • Implement USE clause for multi-database selection in Cypher queries

🐛 Bug Fixes

  • (tests) Add exhaustive pattern matching for ShortestPath variants
  • (parser) Improve shortest path function parsing with case-insensitive matching
  • (parser) Consume leading whitespace in shortest path functions
  • (sql) Correct nested CTE structure for shortest path queries
  • (phase2) Phase 2.7 integration test fixes - path variables working end-to-end
  • WHERE clause handling for variable-length path queries
  • Enable stable background schema monitoring
  • Resolve critical TODO/FIXME items causing runtime panics
  • Root cause fix for duplicate JOIN generation in relationship queries
  • Three critical bug fixes for graph query execution
  • Consolidate benchmark results and add SUT information
  • Resolve path variable regressions after schema-only migration
  • Use last part of CTE name instead of second part

💼 Other

  • Prepare v0.1.0 release

🚜 Refactor

  • (sql) Wire shortest_path_mode through CTE generator
  • Extract CTE generation logic into dedicated module
  • Complete codebase health improvements - modular architecture
  • Standardize test organization with unit/integration/e2e structure
  • Extract common expression processing utilities
  • Organize benchmark suite into dedicated directory
  • Clean up and improve CTE handling for JOIN optimization
  • Remove GraphViewConfig and rename global variables
  • Complete migration from view-based to schema-only configuration
  • Organize project root directory structure

📚 Documentation

  • Add session recap and lessons learned
  • Add shortest path implementation session progress
  • Comprehensive shortest path implementation documentation
  • Add session completion summary
  • Update STATUS.md with Phase 2.7 completion - path variables fully working
  • Update STATUS.md to reflect current state of multiple relationship types
  • Add project documentation and cleanup summaries
  • Complete schema validation enhancement documentation
  • Update STATUS.md and CHANGELOG.md with completed features
  • Update NEXT_STEPS.md with recent completions and current priorities
  • Correct ViewScan relationship support - relationships DO use YAML schemas
  • Correct ViewScan relationship limitation in STATUS.md
  • Remove incorrect OPTIONAL MATCH limitation from STATUS.md and NEXT_STEPS.md
  • Document property mapping debug findings and render plan fixes
  • Update CHANGELOG with property mapping debug session
  • Update CHANGELOG with CASE expressions feature
  • Fix numbering inconsistencies and update WHERE clause filtering status
  • Update STATUS with type-safe configuration completion
  • Update STATUS.md with TODO/FIXME resolution completion
  • Clarify DDL parser TODOs are out-of-scope for read-only engine
  • Sync documentation with current project status
  • Update documentation with bug fixes and benchmark results
  • Update README with 100% benchmark success and recent bug fixes
  • Update STATUS.md with 100% benchmark success
  • Update STATUS and CHANGELOG with enterprise-scale validation
  • Add What's New section to README highlighting enterprise-scale validation
  • Complete benchmark documentation with all three scales
  • Add clear navigation to benchmark results
  • Tone down production-ready claims to development build
  • Add from_node/to_node fields to all relationship schema examples
  • Clarify node label terminology in comments and examples
  • Update STATUS.md with November 2nd achievements
  • Add multi-database support to README and API docs
  • Add PROJECT_STRUCTURE.md guide
  • Add comprehensive USE clause documentation

🧪 Testing

  • (parser) Add comprehensive shortest path parser tests
  • Add shortest path SQL generation test script
  • Add shortest path integration test files
  • Improve test infrastructure and schema configuration
  • Add end-to-end tests for USE clause functionality

⚙️ Miscellaneous Tasks

  • Update .gitignore to exclude temporary files
  • Disable CI on push to main (requires ClickHouse infrastructure)

[iewscan-complete] - 2025-10-19

🚀 Features

  • ✨ Added basic schema inferenc
  • ✨ support for multi node conditions
  • Support for multi node conditions
  • Query planner rewrite (#11)
  • Complete view-based graph infrastructure implementation
  • Comprehensive view optimization infrastructure
  • Complete ClickGraph production-ready implementation
  • Implement relationship traversal support with YAML view integration
  • Implement variable-length path traversal for Cypher queries
  • Complete end-to-end variable-length path execution
  • Add chained JOIN optimization for exact hop count queries
  • Add parser-level validation for variable-length paths
  • Make max_recursive_cte_evaluation_depth configurable with default of 100
  • Add OPTIONAL MATCH AST structures
  • Implement OPTIONAL MATCH parser
  • Implement OPTIONAL MATCH logical plan integration
  • Implement OPTIONAL MATCH with LEFT JOIN semantics
  • Implement view-based SQL translation with ViewScan for node queries
  • Add debug logging for full SQL queries
  • Add schema lookup for relationship types

🐛 Bug Fixes

  • 🐛 relation direction when same node types
  • 🐛 Property tagging to node name
  • 🐛 node name in return clause related issues
  • Count start issue (#6)
  • Schema integration bug - separate column names from node types
  • Rewrite GROUP BY and ORDER BY expressions for variable-length CTEs
  • Preserve Cypher variable aliases in plan sanitization
  • Qualify columns in IN subqueries and use schema columns
  • Prevent CTE nesting and add SELECT * default
  • Pass labels to generate_scan for ViewScan resolution

💼 Other

  • Node name in return clause related issues
  • Add RECURSIVE keyword to variable_length_demo.ipynb SQL descriptions

📚 Documentation

  • Add comprehensive changelog for October 15, 2025 session
  • Update README to use more appropriate terminology
  • Add comprehensive test coverage summary for variable-length paths
  • Simplify documentation structure for better maintainability
  • Add documentation standards to copilot-instructions.md
  • Add ViewScan completion documentation
  • Add git workflow guide and update .gitignore

🧪 Testing

  • Add comprehensive test suite for variable-length paths (30 tests)
  • Add comprehensive testing infrastructure

⚙️ Miscellaneous Tasks

  • Fixed docker pipeline mac issue
  • Fixed docker mac issue
  • Fixed docker image mac issue
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Update CHANGELOG.md [skip ci]
  • Update Cargo.lock after axum 0.8.6 upgrade
  • Clean up debug logging and add NEXT_STEPS documentation