-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use one LocalVocab
for each block during lazy evaluation
#1567
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1567 +/- ##
==========================================
+ Coverage 88.97% 88.98% +0.01%
==========================================
Files 368 368
Lines 33819 33860 +41
Branches 3826 3828 +2
==========================================
+ Hits 30090 30131 +41
Misses 2473 2473
Partials 1256 1256 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass on everything but the tests.
Mostly just refactorings.
Do we really need two structs of [IdTable, LocalVocab] in the Result class?
src/engine/LocalVocab.h
Outdated
void mergeWith(R&& vocabs) { | ||
auto inserter = std::back_inserter(otherWordSets_); | ||
for (const auto& vocab : AD_FWD(vocabs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be const R& vocabs
and without AD_FWD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the merge/share routines could skip all empty local vocabs.
That is an easy optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the merge/share routines could skip all empty local vocabs. That is an easy optimization.
Does it though? To determine the size it does iterate over the same data structures just to calculate the size
Kind of yes. Non-lazy results always have a shared pointer to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The group by probably has a subtle lifetime bug (see my comment from previously) else there are only a few suggestions.
Conformance check passed ✅No test result changes. |
|
@@ -533,7 +533,8 @@ Result::Generator GroupBy::computeResultLazily( | |||
// Reuse buffer if not moved out | |||
resultTable = std::move(outputPair.idTable_); | |||
resultTable.clear(); | |||
currentLocalVocab = LocalVocab{}; | |||
// Keep last local vocab for next commit. | |||
currentLocalVocab = std::move(storedLocalVocabs.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessarily correct.
If your group is very long (e.g. from 5 IdTables ago) , then you also have to store the corresponding local vocab from exactly that group.
So you have to probably have alocal vocab that is associated with the currentGroupBlock
and updated exactly when you do the commitRow
call (because then you implicitly updated the next group block ,isn't that true?
LocalVocab
part of generator for lazy resultsLocalVocab
for each block during lazy evaluation
Lazy operations now yield pairs of
(IdTable, LocalVocab for that IdTable)
. Previously, there was one (potentially large) local vocab for the complete result, even if it was lazy. The new structure can reduce the RAM usage for lazy results that contain many local vocab entries dramatically. Consider for example queries like the following:Previously all the result strings of the
BIND
operation would be kept in RAM at the same time although the complete query (index scan + bind + export) could be computed lazily.