Skip to content

perf(ebean-dao): collapse N per-aspect SQL queries into 1 in batchGetUnion#622

Open
rakhiagr wants to merge 1 commit into
masterfrom
rakagraw/batchgetunion-single-query
Open

perf(ebean-dao): collapse N per-aspect SQL queries into 1 in batchGetUnion#622
rakhiagr wants to merge 1 commit into
masterfrom
rakagraw/batchgetunion-single-query

Conversation

@rakhiagr
Copy link
Copy Markdown
Contributor

Summary

  • Collapse N per-aspect SQL queries into 1 in batchGetUnion() — previously generated 1 SQL per aspect class (e.g., 73 queries for a Dataset get with all aspects). Now issues a single SELECT with all requested a_* columns.
  • URN chunking at MAX_URNS_PER_QUERY=100 to keep IN clause safe for MySQL. Old implicit limit was 50 URNs/IN clause (keysCount=50, 1 aspect per key).
  • gma_deleted filtering moves from SQL to Java — no JSON_EXTRACT in the query. Soft-deleted aspects are returned with their marker; callers (EbeanLocalDAO.toRecordTemplate) filter them out via isSoftDeletedAspect().

Changes

File Change
SQLStatementUtils.java New createMultiAspectReadSql() — single SELECT with explicit aspect columns, entity-type validation guard
EBeanDAOUtils.java New readMultiAspectSqlRows() — parses multi-column rows, delegates to existing readSqlRow()
EbeanLocalAccess.java Rewrote batchGetUnion() to collect all columns + URNs, issue chunked queries (URN-count based)
EbeanLocalDAO.java Added position>0 short-circuit for NEW_SCHEMA_ONLY/DUAL_SCHEMA to avoid duplicate results from outer pagination
loop
IEbeanLocalAccess.java Updated Javadoc to describe new behavior

Edge Cases Handled

  • a_urn column excluded (not an aspect)
  • Column-doesn't-exist check per aspect (validator.columnExists)
  • NULL aspect columns (never written) — skipped in parser
  • Soft-deleted aspects (gma_deleted) — returned as markers, filtered by callers
  • Asset-level deletion (deleted_ts) — correctly applied per row
  • Empty URN set after column filtering — returns empty list
  • Mixed entity types in URN set — throws IllegalArgumentException
  • Pagination duplicates — position>0 returns empty for NEW_SCHEMA_ONLY
  • Large URN batches — chunked into 100-URN queries

Motivation

Code Yellow investigation: MGA SLO violations caused by N+1 query pattern in batchGetUnion. Legacy design from the old row-per-aspect schema. The new entity table schema has all aspects as columns on the same row, making single-query reads trivial.

Performance examples (with MAX_URNS_PER_QUERY=100):

Scenario Old code New code
1 URN × 73 aspects 73 SQL queries 1 SQL
10 URNs × 73 aspects ~146 SQL 1 SQL
200 URNs × 3 aspects 36 SQL 2 SQL
1000 URNs × 3 aspects 180 SQL 10 SQL

Testing Done

Unit tests (all passing)

  • SQLStatementUtilsTest — 7 tests for createMultiAspectReadSql: basic, includeSoftDeleted, testMode, multiple URNs, empty URNs/columns, mixed entity types, URN escaping
  • EBeanDAOUtilsTest — 5 tests for readMultiAspectSqlRows: multi-column, null skip, empty input, empty map, soft-deleted marker
  • EbeanLocalAccessTest — 14 tests: multi-aspect, null aspect, soft-deleted, multi-URN multi-aspect, multi-URN same aspect, single URN, soft-delete filtering by callers, non-existent URN, mixed existing/non-existing, 120 URN chunking, 120 URNs × 2 aspects, boundary at 99/100/101
  • EbeanLocalDAOTest.testGetWithSmallPageSizeNoDuplicatesNewSchema — verifies position>0 short-circuit with setQueryKeysCount(2) and 5 keys
  • EbeanLocalAccessTestWithoutServiceIdentifier.testGetAspectNoSoftDeleteCheck — sibling test updated for new behavior
    Full dao-impl/ebean-dao:test: 2055 tests, 0 failures.

E2E verification on QEI

Deployed to qei-ltx1 with the new datahub-gma JAR via test-gma-changes workflow.

Test 1: Dataset get (all aspects) — verifies single-query path

grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \
  proto.com.linkedin.mg.assets.service.DatasetAssetService.get \
  -d '{"key":{"urn":{"platform":{"platformName":"gridTable"},"datasetName":"test_create_009","origin":"FabricType_EI"}}}'

Response: 4 populated aspects (complianceinfo, retentionpolicy, status, draftretentionpolicy)

Logs:
[batchGetHelper] NEW_SCHEMA_ONLY: totalKeys=73 keysCount=50 — passing all keys to batchGetUnion
[batchGetUnion] keys=73 uniqueUrns=1 aspectColumns=72 includeSoftDeleted=false isTestMode=false
[batchGetUnion] Executing single query: urns=1 sql=SELECT urn, a_usereditableschemainfo, a_dataprivacyreviewv2, a_retentionpolicy, ... [72
columns] ... lastmodifiedon, lastmodifiedby, createdfor FROM metadata_entity_dataset WHERE urn IN
('urn:li:dataset:(urn:li:dataPlatform:gridTable,test_create_009,EI)') AND deleted_ts IS NULL
[readMultiAspectSqlRows] Parsed 1 rows × 72 columns: normal=4 softDeleted=0 null(skipped)=68
[batchGetUnion] Single query returned 1 DB rows, parsed into 4 aspects
[batchGetHelper] NEW_SCHEMA_ONLY: skipping position=50 (already fetched all on position=0)

✅ 1 SQL for 73 aspects (vs 73 in old code), no JSON_EXTRACT, null aspects skipped in Java, position guard prevents duplicate fetching.

Test 2: Dataset get with specific aspects

grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \
  proto.com.linkedin.mg.assets.service.DatasetAssetService.get \
  -d '{"key":{"urn":{...},"aspectTypes":["proto.com.linkedin.common.Status","proto.com.linkedin.dataset.RetentionPolicy"]}}'

Response: exactly 2 requested aspects returned.

Test 3: getWithContext

grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \
  proto.com.linkedin.mg.assets.service.DatasetAssetService.getWithContext \
  -d '{"key":{"urn":{...},"aspectTypes":["Status","RetentionPolicy","DraftRetentionPolicy"]}}'

Response: 3 aspects + per-aspect etag metadata.

Test 4: filter (with aspect hydration)

  grpcurli --dv-auth SELF -f ei-ltx1 d2://datasetAssetService \
    proto.com.linkedin.mg.assets.service.DatasetAssetService.filter \
    -d '{"aspectTypes":["RetentionPolicy","Status"],"indexFilter":{"criteria":{"items":[{"aspect":"com.linkedin.common.urn.DatasetUrn","pathPar
  ams":{"condition":"Condition_EQUAL","path":"/platform/platformName","value":{"stringValue":"gridTable"}}}]}},"paging":{"start":0,"count":3}}'

Response: 3 URNs hydrated with requested aspects via single batchGetUnion call.

Test 5: Soft-delete verification (update + delete + get)

  # Create URN with status + draftretentionpolicy
  grpcurli --dv-auth SELF -f ei-ltx1 -d '{"value":{"urn":{...},"status":{"removed":false},"draftretentionpolicy":{...}}}' \
    d2://datasetAssetService proto.com.linkedin.mg.assets.service.DatasetAssetService/update

  # Soft-delete DraftRetentionPolicy
  grpcurli --dv-auth SELF -f ei-ltx1 -d '{"key":{"urn":{...},"aspectTypes":["com.linkedin.dataset.DraftRetentionPolicy"]}}' \
    d2://datasetAssetService proto.com.linkedin.mg.assets.service.DatasetAssetService/delete

  # Get both aspects
  grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \
    proto.com.linkedin.mg.assets.service.DatasetAssetService.get \
    -d '{"key":{"urn":{...},"aspectTypes":["proto.com.linkedin.common.Status","proto.com.linkedin.dataset.DraftRetentionPolicy"]}}'

Response: only status returned — draftretentionpolicy correctly excluded.


Logs:
[batchGetUnion] Executing single query: urns=1 sql=SELECT urn, a_status, a_draftretentionpolicy, lastmodifiedon, lastmodifiedby, createdfor FROM metadata_entity_dataset WHERE urn IN ('urn:li:dataset:...') AND deleted_ts IS NULL

[readMultiAspectSqlRows] Soft-deleted aspect: urn=urn:li:dataset:(urn:li:dataPlatform:gridTable,testDb.rakagraw_log_test,DEV)
column=a_draftretentionpolicy

✅ SQL has no JSON_EXTRACT (both columns selected unconditionally). Soft-delete marker detected in Java and filtered by caller.

#### Test 6: batchGet (multiple URNs)
```bash
grpcurli --dv-auth SELF -f ei-ltx1 d2://datasetAssetService \
  proto.com.linkedin.mg.assets.service.DatasetAssetService.batchGet \
  -d '{"aspectTypes":["Status","RetentionPolicy"],"urns":[{...urn1},{...urn2}]}'

Response: both URNs returned with correct aspects in 1 call.

  Verification matrix
┌───────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────┐
  │                 Behavior                  │                                   Log evidence                                    │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ Single SQL with all columns               │ Executing single query: urns=1 sql=SELECT urn, a_aspect1, a_aspect2, ...          │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ All 72 aspect columns hydrated in 1 query │ keys=73 uniqueUrns=1 aspectColumns=72                                             │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ Null aspects skipped in Java              │ Parsed 1 rows × 72 columns: normal=4 softDeleted=0 null(skipped)=68               │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ Soft-delete filtered in Java (not SQL)    │ Soft-deleted aspect: urn=... column=a_draftretentionpolicy + response excludes it │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ Position guard prevents duplicates        │ skipping position=50 (already fetched all on position=0)                          │
  ├───────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ No JSON_EXTRACT in SQL                    │ SQL string in log, no JSON_EXTRACT clause                                         │
  └───────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────┘

Checklist

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

Codecov Report

❌ Patch coverage is 98.11321% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 66.84%. Comparing base (b025046) to head (7f04dac).

Files with missing lines Patch % Lines
.../java/com/linkedin/metadata/dao/EbeanLocalDAO.java 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #622      +/-   ##
============================================
+ Coverage     66.82%   66.84%   +0.01%     
- Complexity     1880     1891      +11     
============================================
  Files           148      148              
  Lines          7262     7293      +31     
  Branches        879      886       +7     
============================================
+ Hits           4853     4875      +22     
- Misses         2024     2033       +9     
  Partials        385      385              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rakhiagr rakhiagr force-pushed the rakagraw/batchgetunion-single-query branch from 54b404f to ba38d87 Compare May 12, 2026 00:00
…Union

batchGetUnion() previously generated 1 SQL query per aspect class. For a
get with 73 aspects on a single URN, that was 73 sequential queries to the
same table/row. This collapses them into a single SELECT with all requested
aspect columns, with URN-count chunking (MAX_URNS_PER_QUERY=100) to keep
the IN clause safe for MySQL.

Performance examples:
- 1 URN × 73 aspects:   73 SQL → 1 SQL
- 10 URNs × 73 aspects: ~146 SQL → 1 SQL
- 200 URNs × 3 aspects: 36 SQL → 2 SQL (chunked)
- 1000 URNs × 3 aspects: 180 SQL → 10 SQL (chunked)

Key changes:
- SQLStatementUtils.createMultiAspectReadSql(): builds a single SELECT
  listing all requested aspect columns explicitly (no SELECT *,
  no JSON_EXTRACT filter). Validates URNs are same entity type.
- EBeanDAOUtils.readMultiAspectSqlRows(): parses multi-column rows,
  delegates per-column to existing readSqlRow() for soft-delete /
  asset-delete handling.
- EbeanLocalAccess.batchGetUnion(): collects all aspect columns and URNs
  upfront, chunks URNs into MAX_URNS_PER_QUERY=100 batches, issues one
  SQL per chunk, combines results.
- EbeanLocalDAO.batchGetHelper(): position>0 short-circuit for
  NEW_SCHEMA_ONLY/DUAL_SCHEMA to avoid duplicate results from the outer
  pagination loop in batchGet().
- gma_deleted filtering moves from SQL to Java (no JSON_EXTRACT).
  Soft-deleted aspects are returned with their marker; callers
  (EbeanLocalDAO.toRecordTemplate) already filter via isSoftDeletedAspect().
- IEbeanLocalAccess Javadoc updated to describe new behavior.

Tests:
- SQLStatementUtilsTest: 7 tests for createMultiAspectReadSql
  (basic, includeSoftDeleted, testMode, multiple URNs, empty URNs/columns
  exceptions, mixed entity types exception, URN escaping)
- EBeanDAOUtilsTest: 5 tests for readMultiAspectSqlRows
  (multi-column, null skip, empty input, empty map, soft-deleted marker)
- EbeanLocalAccessTest: 14 tests covering single/multi URN, single/multi
  aspect, null handling, soft-delete by callers, URN chunking with
  120 URNs, boundary at 99/100/101 URNs
- EbeanLocalDAOTest.testGetWithSmallPageSizeNoDuplicatesNewSchema:
  verifies position>0 short-circuit with setQueryKeysCount(2) and 5 keys
- EbeanLocalAccessTestWithoutServiceIdentifier: sibling test updated

Full dao-impl/ebean-dao:test suite passes (2055 tests).

E2E verified on QEI for Dataset get/getWithContext/filter/batchGet/update/delete.
@rakhiagr rakhiagr force-pushed the rakagraw/batchgetunion-single-query branch from ba38d87 to 7f04dac Compare May 12, 2026 00:06
if (validator.columnExists(isTestMode ? getTestTableName(entityUrn) : getTableName(entityUrn),
getAspectColumnName(entityUrn.getEntityType(), aspectClass))) {
keysToQueryMap.computeIfAbsent(aspectClass, unused -> new HashSet<>()).add(entityUrn);
final String tableName = isTestMode ? getTestTableName(entityUrn) : getTableName(entityUrn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to make sure it's the same table name?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here we don't support different urn type, but seems like we do support that previously?

// For DUAL_SCHEMA: compare the accumulated old-schema results (paged) against a single
// unpaged new-schema fetch. Doing the comparison here (rather than per-page in batchGetHelper)
// ensures both sides cover the same key set when keys.size() > keysCount.
boolean nonLatestVersionFlag = keys.stream().anyMatch(key -> key.getVersion() != LATEST_VERSION);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this makes the code super hard to review and maintain, can we simply use different method for old and new schema, or even, is old schema still alive? can we just remove those comparing code.

if (position > 0) {
return Collections.emptyList();
}
return _localAccess.batchGetUnion(keys, keys.size(), 0, false, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me is that we are forcing to not use pagination. then should we just simply remove that

* @param isTestMode whether to use the test table
* @return SQL string for the multi-aspect read
*/
public static String createMultiAspectReadSql(@Nonnull Set<String> aspectColumnNames,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how too we avoid sql injection here?

final boolean isAssetLevelDeleted = assetDeletedTs != null;

if (isSoftDeletedAspect(sqlRow, columnName)) {
primaryKey = new EbeanMetadataAspect.PrimaryKey(urn, aspectClass.getCanonicalName(), LATEST_VERSION);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only store gam_deleted now, I see previously we use query to filter them out, but here, seems we will still return the value with deleted. not sure what's the impact here, can you add some test to see the behavior? also cc @jphui to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants