perf(lineage): viewport virtualization + perf benchmark infra by acrylJonny · Pull Request #17600 · datahub-project/datahub

acrylJonny · 2026-05-27T16:48:05Z

Summary

Adds viewport-based DOM virtualization for the V2 lineage graph plus an opt-in Playwright benchmark suite for measuring and regressing lineage rendering performance at scale.

Why

The V2 lineage graph renders every node and edge regardless of viewport, which makes large graphs (500+ nodes) very expensive to expand, pan, and click. Synthetic measurements on this branch show single-action stalls of 2.4–4.4 s at 500 nodes (expand-fanout, pan-horizontal, click-root, hover-column-root).

What

Backend feature flags (metadata-service/configuration, datahub-graphql-core)

lineageGraphPerfVirtEnabled (default true) — server default for auto-applying ReactFlow's onlyRenderVisibleElements above ~50 rendered nodes.
lineageGraphPerfOverscanEnabled (default false) — server default for the wider overscan-buffered virt path above ~200 nodes; trades a small pan-cost regression for fewer pop-in artefacts.
Both surface through appConfig.featureFlags and AppConfigResolver, and are classified as non-sensitive in PropertiesCollectorConfigurationTest. URL (?lineagePerf=) and localStorage (datahub.lineagePerfFlags) still override per-session for diagnostics.

Frontend — lineageV2 (datahub-web-react/src/app/lineageV2)

perfFlags.ts resolves modes (virt, overscan) from server defaults, URL, and localStorage with documented precedence; unit-tested in __tests__/perfFlags.test.ts.
useOverscanVirt.ts inflates ReactFlow's nodeExtent/viewport bounds by DEFAULT_OVERSCAN_FACTOR so neighbours mount before they scroll into view.
LineageVisualization reads server flags via useAppConfig() and threads the resolved modes into ReactFlow.
LineageVisualizationContext adds forceMountAll so the screenshot export can temporarily disable virtualization for capture.

Screenshot export (V2 + V3)

application.conf CSP: add data: to img-src. html-to-image inlines its serialised SVG as a data: URI before rasterising, so the screenshot button was silently broken under the previous policy. Surfaced by the new screenshot-stress spec.
DownloadLineageScreenshotButton (V2 + V3): replace console.error with antd.message.error for user-visible feedback; new focused test for the failure path.

E2E perf benchmark suite (e2e-test/ui/playwright)

New tests/lineage-perf/ directory, opt-in via LINEAGE_PERF=1:
- Journey benchmark across small / chain+columns / filter-hub graphs.
- Synthetic scaling matrix (100/500/1000 nodes) × {baseline, virt, virt+overscan}.
- Opt-in screenshot stress (LINEAGE_PERF_SCREENSHOT=1).
- Opt-in axe-core accessibility audit (LINEAGE_A11Y=1, LINEAGE_A11Y_LARGE=1).
LineagePerfRecorder (utils/lineage-perf-collector.ts) records wall time, long tasks, FPS, network request count + bytes per action.
lineage-perf-seeder.ts programmatically seeds Dataset, column lineage, DataJob/DataFlow, Chart, and Dashboard graphs at arbitrary scale via ingestProposal.
LineagePerfPage (pages/lineage-perf.page.ts) — virt-aware expandFanoutFully helper plus standard navigation steps.
LINEAGE_PERF_REPEAT=N runs each scenario N times for variance.
scripts/lineage-perf-aggregate.mjs computes p50 / p95 / max bands from the JSON output; wired up as yarn perf:aggregate and yarn perf:aggregate:json. Documented in e2e-test/ui/playwright/README.md.
v2-lineage-virt.spec.ts regression test: virt path produces the expected DOM node count for forced-on and forced-off variants.

Risk and rollout

Default behaviour for new deployments: virt on (auto-threshold), overscan off. The auto-threshold means small graphs render identically to today.
Both flags can be overridden via env var (LINEAGE_GRAPH_PERF_VIRT_ENABLED, LINEAGE_GRAPH_PERF_OVERSCAN_ENABLED) or via URL / localStorage per session.
Screenshot CSP fix is independently safe: only widens img-src to also allow data: URIs, which is needed by html-to-image.

Tests

New Java unit-test coverage: AppConfigResolverTest, PropertiesCollectorConfigurationTest updated.
New frontend unit tests: perfFlags.test.ts, virtualization.sanity.test.tsx, DownloadLineageScreenshotButton.test.tsx.
New Playwright suites under tests/lineage-perf/ and tests/lineage-v2/v2-lineage-virt.spec.ts.
Verified locally with LINEAGE_PERF=1 yarn perf against a running stack — 12 / 12 active tests pass; a11y + screenshot-stress suites are opt-in.

Checklist

PR conforms to the Contributing Guideline (PR title format)
Tests added/updated
Docs added/updated (e2e-test/ui/playwright/README.md — performance benchmark instructions)
No breaking changes (feature flags default to current behaviour at small scale)

Made with Cursor

Adds viewport-based DOM virtualization for the V2 lineage graph plus an opt-in Playwright benchmark suite for measuring and regressing lineage rendering performance at scale. Backend feature flags - `lineageGraphPerfVirtEnabled` (default `true`) - server default for auto-applying `onlyRenderVisibleElements` above ~50 rendered nodes. - `lineageGraphPerfOverscanEnabled` (default `false`) - server default for the wider overscan-buffered virt path above ~200 nodes; trades a small pan-cost regression for fewer pop-in artefacts. - Both are exposed via `appConfig.featureFlags` (`app.graphql`) and wired through `AppConfigResolver`. URL (`?lineagePerf=`) and localStorage (`datahub.lineagePerfFlags`) still override per-session for diagnostics; server values are the baseline. Frontend (lineageV2) - `perfFlags.ts` resolves modes (`virt`, `overscan`) from server defaults, URL, and localStorage with documented precedence. - `useOverscanVirt.ts` inflates ReactFlow's `nodeExtent`/`viewport` bounds by `DEFAULT_OVERSCAN_FACTOR` so neighbours mount before they scroll into view. - `LineageVisualization` reads server flags via `useAppConfig()` and threads the resolved modes into `ReactFlow`. - `LineageVisualizationContext` adds `forceMountAll` so the screenshot export can temporarily disable virtualization for capture. Screenshot export - `application.conf` CSP: add `data:` to `img-src`. `html-to-image` inlines its serialised SVG as a `data:` URI before rasterising, so the screenshot button was silently broken under the previous policy. Surfaced by the new screenshot-stress spec. - `DownloadLineageScreenshotButton` (V2 + V3): replace `console.error` with `antd.message.error` for user-visible feedback and add a focused test for the failure path. E2E perf benchmark suite - New `tests/lineage-perf/` directory (opt-in via `LINEAGE_PERF=1`): journey benchmark across small / chain+columns / filter-hub graphs, synthetic scaling matrix (100/500/1000 nodes), opt-in screenshot stress (`LINEAGE_PERF_SCREENSHOT=1`) and axe-core accessibility audit (`LINEAGE_A11Y=1`). - `LineagePerfRecorder` (`utils/lineage-perf-collector.ts`) records wall time, long tasks, FPS, network requests + bytes per action. - `lineage-perf-seeder.ts` programmatically seeds dataset, column lineage, DataJob/DataFlow, Chart, and Dashboard graphs at arbitrary scale via `ingestProposal`. - `LineagePerfPage` (`pages/lineage-perf.page.ts`) - virt-aware `expandFanoutFully` helper plus standard navigation steps. - `LINEAGE_PERF_REPEAT=N` runs each scenario N times for variance. - `scripts/lineage-perf-aggregate.mjs` computes p50 / p95 / max bands from the JSON output; wired up as `yarn perf:aggregate` and `yarn perf:aggregate:json`. Documented in playwright README. - `v2-lineage-virt.spec.ts` regression test: virt path produces the expected DOM node count for forced-on and forced-off variants. Headline numbers (synthetic 500 nodes, forced virt vs baseline): - expand-fanout 2446 ms -> 65 ms (-97%) - pan-horizontal 4367 ms -> 384 ms (-91%) - click-root 2496 ms -> 39 ms - hover-column-root 2381 ms -> 328 ms

alwaysmeticulous · 2026-05-27T16:53:02Z

✅ Meticulous spotted 0 visual differences across 1390 screens tested: view results.

Meticulous evaluated ~10 hours of user flows against your PR.

_{Expected differences? Click here. Last updated for commit 025f636 ci(perf): build PR-branch images before lineage perf run. This comment will update as new commits are pushed.}

ESLint's `rulesdir/no-hardcoded-colors` rejected hex / rgba literals in two fixtures introduced by the perf benchmark work: - `DownloadLineageScreenshotButton.test.tsx`: replace the fabricated `{ bgSurface: '#fff' }` stub with the real `lightTheme` import, so the test inherits the project's semantic token table. - `stubNodeTypes.tsx`: these stubs render in jsdom without a `ThemeProvider`, so swapping the hex literals for theme tokens would just resolve to undefined. The colours were pure decoration — the virtualisation sanity test asserts on mounted DOM-node counts, not visuals — so replace them with neutral CSS keywords (`transparent`, `currentColor`, `inherit`). Preserve the prop-driven styled component pattern on `Tag` by varying padding instead of background. Surfaced by CI lint on PR #17600.

codecov · 2026-05-27T17:13:49Z

Bundle Report

Changes will increase total bundle size by 3.33kB (0.01%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
datahub-react-web-esm	23.23MB	3.33kB (0.01%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/index-*.js`	3.33kB	8.78MB	0.04%

Files in assets/index-*.js:

./src/app/lineageV2/useOverscanVirt.ts → Total Size: 2.23kB
./src/app/lineageV2/controls/DownloadLineageScreenshotButton.tsx → Total Size: 2.71kB
./src/app/lineageV3/LineageVisualizationContext.tsx → Total Size: 320 bytes
./src/app/lineageV3/LineageVisualization.tsx → Total Size: 4.28kB
./src/app/lineageV3/controls/DownloadLineageScreenshotButton.tsx → Total Size: 2.07kB
./src/app/lineageV2/LineageVisualization.tsx → Total Size: 5.18kB
./src/appConfigContext.tsx → Total Size: 2.9kB
./src/app/lineageV2/perfFlags.ts → Total Size: 2.25kB
./src/app/lineageV2/LineageVisualizationContext.tsx → Total Size: 249 bytes

codecov · 2026-05-27T17:27:36Z

Codecov Report

❌ Patch coverage is 46.84385% with 160 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...react/src/app/lineageV3/__perf__/syntheticGraph.ts	28.67%	97 Missing ⚠️
...hub-web-react/src/app/lineageV2/useOverscanVirt.ts	14.86%	63 Missing ⚠️

📢 Thoughts on this report? Let us know!

Adds three `workflow_dispatch` inputs to `playwright-e2e-tests.yml`: - `lineage_perf` — scope the run to `tests/lineage-perf/` with `LINEAGE_PERF=1`. Forces `shard_count=1` (perf benchmarks need stable single-process timing — sharding would split scenarios across runners and invalidate cross-variant comparisons) and bumps the job timeout from 20 m to 60 m to accommodate the synthetic / screenshot / a11y matrices. - `lineage_perf_screenshot` — opt-in `LINEAGE_PERF_SCREENSHOT=1` screenshot stress matrix (100/500/1000 nodes × baseline/virt). - `lineage_a11y` — opt-in axe-core audit (`LINEAGE_A11Y=1`, `LINEAGE_A11Y_LARGE=1`). When `lineage_perf=true`, also uploads `lineage-perf.json`, `lineage-screenshot-stress.tsv`, and the a11y JSON artefacts under the `lineage-perf-results` artifact for later aggregation via `yarn perf:aggregate`. The default (non-perf) flow is unchanged — full sharded run of the standard Playwright suite.

Aikido flagged the new `Run Playwright tests` step as a critical template-injection risk because it inlined `${{ matrix.shard }}`, `${{ matrix.shard_count }}`, and `${{ github.event.inputs.lineage_perf }}` directly inside the shell `run:` block. Even though those values come from our own setup job (not untrusted external input), the GitHub Actions security guidance is to always pipe context references through `env:` so they're never evaluated as part of the shell command. Move `lineage_perf`, `matrix.shard`, and `matrix.shard_count` into the step's `env:` and reference them as `$LINEAGE_PERF_INPUT`, `$MATRIX_SHARD`, `$MATRIX_SHARD_COUNT` from the shell. No behaviour change.

The lineage_perf workflow_dispatch path was pulling the published `acryldata/...:quickstart` images, so it benchmarked whatever code happened to be on master at last release — not the PR. That made the screenshot stress test fail (missing CSP fix in application.conf) and produced perf numbers that didn't reflect the virtualization / overscan changes in this branch. When lineage_perf=true the job now: 1. Derives a tag from GITHUB_REF via docker_helpers.sh. 2. Runs `:docker:buildImagesQuickstart` with the GitHub buildx cache, tagging the built images as `acryldata/<image>:<tag>` locally. 3. Passes the same tag as DATAHUB_VERSION to run-quickstart.sh so compose resolves the PR-built images instead of pulling. Default (non-perf) runs are unchanged — they still pull `:quickstart`. Job timeout bumped from 60 to 90 minutes to cover the build (~25–30m cold, faster with cache) on top of the existing 25–30m perf matrix. Build step has its own 45m timeout so a hung build can't consume the whole job.

github-actions Bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels May 27, 2026

github-actions Bot deployed to datahub-project-web-react (Preview) May 27, 2026 16:52 View deployment

vercel Bot deployed to Preview May 27, 2026 16:59 View deployment

github-actions Bot deployed to datahub-project-web-react (Preview) May 27, 2026 17:09 View deployment

vercel Bot deployed to Preview May 27, 2026 17:19 View deployment

github-actions Bot deployed to datahub-project-web-react (Preview) May 27, 2026 17:51 View deployment

github-actions Bot deployed to datahub-project-web-react (Preview) May 27, 2026 17:57 View deployment

vercel Bot deployed to Preview May 27, 2026 18:10 View deployment

github-actions Bot deployed to datahub-project-web-react (Preview) May 27, 2026 20:22 View deployment

vercel Bot deployed to Preview May 27, 2026 20:32 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lineage): viewport virtualization + perf benchmark infra#17600

perf(lineage): viewport virtualization + perf benchmark infra#17600
acrylJonny wants to merge 5 commits into
masterfrom
datahub-lineage-improvements-foundation

acrylJonny commented May 27, 2026 •

edited

Loading

Uh oh!

alwaysmeticulous Bot commented May 27, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Assets Changed:

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acrylJonny commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What

Risk and rollout

Tests

Checklist

Uh oh!

alwaysmeticulous Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

acrylJonny commented May 27, 2026 •

edited

Loading

alwaysmeticulous Bot commented May 27, 2026 •

edited

Loading

codecov Bot commented May 27, 2026 •

edited

Loading

codecov Bot commented May 27, 2026 •

edited

Loading