-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Union
that can keep sort order
#1834
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1834 +/- ##
==========================================
+ Coverage 90.18% 90.22% +0.04%
==========================================
Files 400 400
Lines 38440 38635 +195
Branches 4306 4340 +34
==========================================
+ Hits 34666 34858 +192
+ Misses 2479 2463 -16
- Partials 1295 1314 +19 ☔ View full report in Codecov by Sentry. |
@joka921 Regarding performance comparisons, using this query as a baseline: As you can see, originally it takes almost a minute for 2 billion rows to be processed. Using |
Conformance check passed ✅No test result changes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass on everything but the tests.
ColumnIndex index1 = _columnOrigins.at(col).at(0); | ||
ColumnIndex index2 = _columnOrigins.at(col).at(1); | ||
if (index1 == NO_COLUMN) { | ||
return true; | ||
} | ||
if (index2 == NO_COLUMN) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could remove all the sort columns after the first NO_COLUM, and precompute an
efficient return value
(If all the columns are equal, do i return true (because there was NO_COLUMN in the left input) or false (because the NO_COLUMN was right, or because all columns were equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what exactly you want. I can't initially trim targetOrder_
, because we still need the original values elsewhere. Otherwise this information is lost. If you're proposing adding a copy that is tailored towards the comparator, to do better loop unrolling I could certainly do that, but I don't think sort orders that just apply to a single child are common, so I'm not sure if it's worth the added complexity.
co_yield pair; | ||
} | ||
if (!requestLaziness) { | ||
co_yield Result::IdTableVocabPair{std::move(resultTable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving away from coroutines for our C++17 backport,
please consider which types of ranges or state machines (see the end of the Iterators.h
header)
can be used here.
This unfortunately will have some more lines of code, but we will benefit from it.
This change allows the
Union
operation to efficiently merge already sorted partial results if requested, but not by default because that's more expensive to do than just merging everything together.