-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the index of each block in the block's metadata #1571
Conversation
Signed-off-by: Johannes Kalmbach <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1571 +/- ##
=======================================
Coverage 88.97% 88.98%
=======================================
Files 368 368
Lines 33819 33825 +6
Branches 3826 3827 +1
=======================================
+ Hits 30090 30098 +8
Misses 2473 2473
+ Partials 1256 1254 -2 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
Signed-off-by: Johannes Kalmbach <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have started with this one, it was much easier :-) Anyway, thanks for the work + just one minor comment
src/index/CompressedRelation.cpp
Outdated
@@ -874,13 +874,17 @@ void CompressedRelationWriter::compressAndWriteBlock( | |||
AD_CORRECTNESS_CHECK(lastCol0Id == last[0]); | |||
|
|||
auto [hasDuplicates, graphInfo] = getGraphInfo(block); | |||
// The block indices will be set later, write a recognizable dummy value for | |||
// easier debugging. | |||
static constexpr size_t blockIndexDummy = 4387; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason why we can't know the block index when the block is written? Maybe because the blocks are written in parallel? Anyway, if there is a good reason, we should define a constant with a value like std::numeric_limits<size_t>::max()
and a meaningful name like BLOCK_INDEX_NOT_YET_SET
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the parallelism is the issue here, I have added a comment that explains that and made the constant look more like something that looks intentions.
I don't like max()
as it is equal to -1
which often occurs by accidental bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you could take max() - 1
or max() - 2
and you could certainly add a static constant class member BLOCK_INDEX_NOT_YET_SET
to give this thing a proper name in the code and in the tests.
Signed-off-by: Johannes Kalmbach <[email protected]>
Conformance check passed ✅No test result changes. |
|
This is a preparation for SPARQL UPDATE: When working with a subset of all blocks (e.g. because some prefiltering was applied), then we still have to know the absolute index of that block to find the corresponding
LocatedTriples
. The easiest way to find that index is to explicitly store it in theCompressedBlockMetadata
.