-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the patterns from the permutations and no longer load ql:has-pattern
into RAM
#1223
Conversation
Next step: neither write nor read the old subject-to-pattern-matching.
Next step: Prepare a preliminary PR to let Hannah try it out on real world knowledge graphs.
TODO<joka921> update the ddate as soon as we know on which day we merge.
# Conflicts: # .github/workflows/code-coverage.yml # test/ExceptionHandlingTest.cpp # test/IndexTestHelpers.h
TODO Actually write them during CreatePermutations, and then also retrieve them during the pattern processing.
But all previous unit tests pass again.
TODO<joka921> Objects...
Missing piece (probably) During the index-Building we need an optional join to handle the `noPattern` case for objects that don't appear as subjects.
join in a batched fashion.
join in a batched fashion.
# Conflicts: # src/engine/idTable/CompressedExternalIdTable.h # src/engine/idTable/IdTable.h # src/engine/idTable/IdTableRow.h # src/index/IndexImpl.cpp # test/engine/idTable/CompressedExternalIdTableTest.cpp
…too many changes left.
# Conflicts: # src/index/IndexImpl.cpp
# Conflicts: # test/QueryPlannerTest.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1-1 with Johannes, I am amazed at the amount of work (and that it works, of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, another major milestone taken!
|
ql:has-pattern
into RAM
So far, the method `Vocabulary::prefix_range` returned a range of indexes of words in the internal vocabulary that match a given prefix. This is now replaced by a method `Vocabulary::prefixRange`, which returns two ranges, one for the internal and one for the external vocabulary. This can easily be extended to more ranges when needed. Based on this, our efficient implementation for REGEX when the regular expression is a prefix now finds items in both the internal and the external vocabulary (so far: only in the internal vocabulary). On the side, identified and fixed a bug in the previous code, where the special predicates starting with `@` would not be found if `prefixes-external` matched `@`. As a conesequence, QLever now works as it should even with `"prefixes-external": [""]` (that is, everything expect QLever-internal predicates, ends up in the external vocabulary) in the `.settings.json` file. In conjunction with #1223, this makes it now possible to start a QLever server with very little RAM.
So far, the method `Vocabulary::prefix_range` returned a range of indexes of words in the internal vocabulary that match a given prefix. This is now replaced by a method `Vocabulary::prefixRange`, which returns two ranges, one for the internal and one for the external vocabulary. This can easily be extended to more ranges when needed. Based on this, our efficient implementation for `REGEX` when the regular expression is a prefix, now finds items in both the internal and the external vocabulary (so far: only in the internal vocabulary). On the side, identified and fixed a bug in the previous code, where the special predicates starting with `@` would not be found if `prefixes-external` matched `@`. As a consequence, QLever now works as it should even with `"prefixes-external": [""]`, that is, even when everything except the QLever-internal predicates ends up in the external vocabulary. In conjunction with #1223, this makes it now possible to start a QLever server with very little RAM.
PRs #1168 and #1177 have added the subject patterns as two additional columns to the OSP&OPS and PSO&POS permutations. PR #1226 has added the triples of the
ql:has-pattern
predicate to the PSO&POS permutations. Now use this information instead of the old patterns, which did cost a lot of RAM. We tried a few queries involving patterns and the speed is very similar to that of the previous implementation.NOTE: This is an index-breaking change. The old
.index.patterns
file stored theql:has-pattern
predicate (for each subject its pattern) and the information which pattern consists of which predicates. Now the.index.patterns
file only stores the latter information. The file size therefore is significantly reduced and no longer depends on the size of the dataset (but only on how many distinct patterns there are, typically few). For example, for Wikidata, the file size reduced from 17 GB to 2.8 GB. For UniProt, the reduction is from 152 GB (which does not fit into the RAM of our standard machines) to something very small (because UniProt is very regular and there are only very few distinct patterns).