Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Union that can keep sort order #1834

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Feb 25, 2025

This change allows the Union operation to efficiently merge already sorted partial results if requested, but not by default because that's more expensive to do than just merging everything together.

Copy link

codecov bot commented Feb 25, 2025

Codecov Report

Attention: Patch coverage is 87.80488% with 25 lines in your changes missing coverage. Please review.

Project coverage is 90.22%. Comparing base (8fe0642) to head (399fb39).

Files with missing lines Patch % Lines
src/engine/Union.cpp 87.91% 3 Missing and 19 partials ⚠️
src/engine/idTable/IdTable.h 83.33% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1834      +/-   ##
==========================================
+ Coverage   90.18%   90.22%   +0.04%     
==========================================
  Files         400      400              
  Lines       38440    38635     +195     
  Branches     4306     4340      +34     
==========================================
+ Hits        34666    34858     +192     
+ Misses       2479     2463      -16     
- Partials     1295     1314      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 27, 2025

@joka921 Regarding performance comparisons, using this query as a baseline:
Query Screenshot

As you can see, originally it takes almost a minute for 2 billion rows to be processed.
Using the original code (without the added templates) it runs way faster (and the memory footprint is also way lower):
Faster version

Using IdTableStatic for the output table, didn't help the performance at all. Using a std::span with a compile-time size (thus allowing the compiler to unroll the loop) for the isSmaller check, helped to shave off another second from the Union operation. Adding the AD_ALWAYS_INLINE macro, helped shaving off additional 2 seconds from the runtime on the test machine.

@sparql-conformance
Copy link

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first pass on everything but the tests.

Comment on lines +365 to +372
ColumnIndex index1 = _columnOrigins.at(col).at(0);
ColumnIndex index2 = _columnOrigins.at(col).at(1);
if (index1 == NO_COLUMN) {
return true;
}
if (index2 == NO_COLUMN) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove all the sort columns after the first NO_COLUM, and precompute an
efficient return value
(If all the columns are equal, do i return true (because there was NO_COLUMN in the left input) or false (because the NO_COLUMN was right, or because all columns were equal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what exactly you want. I can't initially trim targetOrder_, because we still need the original values elsewhere. Otherwise this information is lost. If you're proposing adding a copy that is tailored towards the comparator, to do better loop unrolling I could certainly do that, but I don't think sort orders that just apply to a single child are common, so I'm not sure if it's worth the added complexity.

co_yield pair;
}
if (!requestLaziness) {
co_yield Result::IdTableVocabPair{std::move(resultTable),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving away from coroutines for our C++17 backport,
please consider which types of ranges or state machines (see the end of the Iterators.h header)
can be used here.
This unfortunately will have some more lines of code, but we will benefit from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants