perf(ebean-dao): collapse N per-aspect SQL queries into 1 in batchGetUnion#622
perf(ebean-dao): collapse N per-aspect SQL queries into 1 in batchGetUnion#622rakhiagr wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #622 +/- ##
============================================
+ Coverage 66.82% 66.84% +0.01%
- Complexity 1880 1891 +11
============================================
Files 148 148
Lines 7262 7293 +31
Branches 879 886 +7
============================================
+ Hits 4853 4875 +22
- Misses 2024 2033 +9
Partials 385 385 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
54b404f to
ba38d87
Compare
…Union batchGetUnion() previously generated 1 SQL query per aspect class. For a get with 73 aspects on a single URN, that was 73 sequential queries to the same table/row. This collapses them into a single SELECT with all requested aspect columns, with URN-count chunking (MAX_URNS_PER_QUERY=100) to keep the IN clause safe for MySQL. Performance examples: - 1 URN × 73 aspects: 73 SQL → 1 SQL - 10 URNs × 73 aspects: ~146 SQL → 1 SQL - 200 URNs × 3 aspects: 36 SQL → 2 SQL (chunked) - 1000 URNs × 3 aspects: 180 SQL → 10 SQL (chunked) Key changes: - SQLStatementUtils.createMultiAspectReadSql(): builds a single SELECT listing all requested aspect columns explicitly (no SELECT *, no JSON_EXTRACT filter). Validates URNs are same entity type. - EBeanDAOUtils.readMultiAspectSqlRows(): parses multi-column rows, delegates per-column to existing readSqlRow() for soft-delete / asset-delete handling. - EbeanLocalAccess.batchGetUnion(): collects all aspect columns and URNs upfront, chunks URNs into MAX_URNS_PER_QUERY=100 batches, issues one SQL per chunk, combines results. - EbeanLocalDAO.batchGetHelper(): position>0 short-circuit for NEW_SCHEMA_ONLY/DUAL_SCHEMA to avoid duplicate results from the outer pagination loop in batchGet(). - gma_deleted filtering moves from SQL to Java (no JSON_EXTRACT). Soft-deleted aspects are returned with their marker; callers (EbeanLocalDAO.toRecordTemplate) already filter via isSoftDeletedAspect(). - IEbeanLocalAccess Javadoc updated to describe new behavior. Tests: - SQLStatementUtilsTest: 7 tests for createMultiAspectReadSql (basic, includeSoftDeleted, testMode, multiple URNs, empty URNs/columns exceptions, mixed entity types exception, URN escaping) - EBeanDAOUtilsTest: 5 tests for readMultiAspectSqlRows (multi-column, null skip, empty input, empty map, soft-deleted marker) - EbeanLocalAccessTest: 14 tests covering single/multi URN, single/multi aspect, null handling, soft-delete by callers, URN chunking with 120 URNs, boundary at 99/100/101 URNs - EbeanLocalDAOTest.testGetWithSmallPageSizeNoDuplicatesNewSchema: verifies position>0 short-circuit with setQueryKeysCount(2) and 5 keys - EbeanLocalAccessTestWithoutServiceIdentifier: sibling test updated Full dao-impl/ebean-dao:test suite passes (2055 tests). E2E verified on QEI for Dataset get/getWithContext/filter/batchGet/update/delete.
ba38d87 to
7f04dac
Compare
| if (validator.columnExists(isTestMode ? getTestTableName(entityUrn) : getTableName(entityUrn), | ||
| getAspectColumnName(entityUrn.getEntityType(), aspectClass))) { | ||
| keysToQueryMap.computeIfAbsent(aspectClass, unused -> new HashSet<>()).add(entityUrn); | ||
| final String tableName = isTestMode ? getTestTableName(entityUrn) : getTableName(entityUrn); |
There was a problem hiding this comment.
Do you need to make sure it's the same table name?
There was a problem hiding this comment.
actually here we don't support different urn type, but seems like we do support that previously?
| // For DUAL_SCHEMA: compare the accumulated old-schema results (paged) against a single | ||
| // unpaged new-schema fetch. Doing the comparison here (rather than per-page in batchGetHelper) | ||
| // ensures both sides cover the same key set when keys.size() > keysCount. | ||
| boolean nonLatestVersionFlag = keys.stream().anyMatch(key -> key.getVersion() != LATEST_VERSION); |
There was a problem hiding this comment.
Hmm this makes the code super hard to review and maintain, can we simply use different method for old and new schema, or even, is old schema still alive? can we just remove those comparing code.
| if (position > 0) { | ||
| return Collections.emptyList(); | ||
| } | ||
| return _localAccess.batchGetUnion(keys, keys.size(), 0, false, false); |
There was a problem hiding this comment.
seems to me is that we are forcing to not use pagination. then should we just simply remove that
| * @param isTestMode whether to use the test table | ||
| * @return SQL string for the multi-aspect read | ||
| */ | ||
| public static String createMultiAspectReadSql(@Nonnull Set<String> aspectColumnNames, |
There was a problem hiding this comment.
how too we avoid sql injection here?
| final boolean isAssetLevelDeleted = assetDeletedTs != null; | ||
|
|
||
| if (isSoftDeletedAspect(sqlRow, columnName)) { | ||
| primaryKey = new EbeanMetadataAspect.PrimaryKey(urn, aspectClass.getCanonicalName(), LATEST_VERSION); |
There was a problem hiding this comment.
We only store gam_deleted now, I see previously we use query to filter them out, but here, seems we will still return the value with deleted. not sure what's the impact here, can you add some test to see the behavior? also cc @jphui to review
Summary
batchGetUnion()— previously generated 1 SQL per aspect class (e.g., 73 queries for a Dataset get with all aspects). Now issues a single SELECT with all requesteda_*columns.JSON_EXTRACTin the query. Soft-deleted aspects are returned with their marker; callers (EbeanLocalDAO.toRecordTemplate) filter them out viaisSoftDeletedAspect().Changes
SQLStatementUtils.javacreateMultiAspectReadSql()— single SELECT with explicit aspect columns, entity-type validation guardEBeanDAOUtils.javareadMultiAspectSqlRows()— parses multi-column rows, delegates to existingreadSqlRow()EbeanLocalAccess.javabatchGetUnion()to collect all columns + URNs, issue chunked queries (URN-count based)EbeanLocalDAO.javaposition>0short-circuit for NEW_SCHEMA_ONLY/DUAL_SCHEMA to avoid duplicate results from outer paginationIEbeanLocalAccess.javaEdge Cases Handled
a_urncolumn excluded (not an aspect)validator.columnExists)gma_deleted) — returned as markers, filtered by callersdeleted_ts) — correctly applied per rowposition>0returns empty for NEW_SCHEMA_ONLYMotivation
Code Yellow investigation: MGA SLO violations caused by N+1 query pattern in
batchGetUnion. Legacy design from the old row-per-aspect schema. The new entity table schema has all aspects as columns on the same row, making single-query reads trivial.Performance examples (with MAX_URNS_PER_QUERY=100):
Testing Done
Unit tests (all passing)
SQLStatementUtilsTest— 7 tests forcreateMultiAspectReadSql: basic, includeSoftDeleted, testMode, multiple URNs, empty URNs/columns, mixed entity types, URN escapingEBeanDAOUtilsTest— 5 tests forreadMultiAspectSqlRows: multi-column, null skip, empty input, empty map, soft-deleted markerEbeanLocalAccessTest— 14 tests: multi-aspect, null aspect, soft-deleted, multi-URN multi-aspect, multi-URN same aspect, single URN, soft-delete filtering by callers, non-existent URN, mixed existing/non-existing, 120 URN chunking, 120 URNs × 2 aspects, boundary at 99/100/101EbeanLocalDAOTest.testGetWithSmallPageSizeNoDuplicatesNewSchema— verifiesposition>0short-circuit withsetQueryKeysCount(2)and 5 keysEbeanLocalAccessTestWithoutServiceIdentifier.testGetAspectNoSoftDeleteCheck— sibling test updated for new behaviorFull
dao-impl/ebean-dao:test: 2055 tests, 0 failures.E2E verification on QEI
Deployed to
qei-ltx1with the new datahub-gma JAR via test-gma-changes workflow.Test 1: Dataset get (all aspects) — verifies single-query path
grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \ proto.com.linkedin.mg.assets.service.DatasetAssetService.get \ -d '{"key":{"urn":{"platform":{"platformName":"gridTable"},"datasetName":"test_create_009","origin":"FabricType_EI"}}}'Response: 4 populated aspects (complianceinfo, retentionpolicy, status, draftretentionpolicy)
✅ 1 SQL for 73 aspects (vs 73 in old code), no JSON_EXTRACT, null aspects skipped in Java, position guard prevents duplicate fetching.
Test 2: Dataset get with specific aspects
grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \ proto.com.linkedin.mg.assets.service.DatasetAssetService.get \ -d '{"key":{"urn":{...},"aspectTypes":["proto.com.linkedin.common.Status","proto.com.linkedin.dataset.RetentionPolicy"]}}'Response: exactly 2 requested aspects returned.
Test 3: getWithContext
grpcurli --dv-auth SELF -f ei-ltx1 localhost:25403 \ proto.com.linkedin.mg.assets.service.DatasetAssetService.getWithContext \ -d '{"key":{"urn":{...},"aspectTypes":["Status","RetentionPolicy","DraftRetentionPolicy"]}}'Response: 3 aspects + per-aspect etag metadata.
Test 4: filter (with aspect hydration)
grpcurli --dv-auth SELF -f ei-ltx1 d2://datasetAssetService \ proto.com.linkedin.mg.assets.service.DatasetAssetService.filter \ -d '{"aspectTypes":["RetentionPolicy","Status"],"indexFilter":{"criteria":{"items":[{"aspect":"com.linkedin.common.urn.DatasetUrn","pathPar ams":{"condition":"Condition_EQUAL","path":"/platform/platformName","value":{"stringValue":"gridTable"}}}]}},"paging":{"start":0,"count":3}}'Response: 3 URNs hydrated with requested aspects via single batchGetUnion call.
Test 5: Soft-delete verification (update + delete + get)
Response: only status returned — draftretentionpolicy correctly excluded.
Response: both URNs returned with correct aspects in 1 call.
Checklist