Use the patterns from the permutations and no longer load `ql:has-pattern` into RAM #1223

joka921 · 2024-01-12T15:07:58Z

PRs #1168 and #1177 have added the subject patterns as two additional columns to the OSP&OPS and PSO&POS permutations. PR #1226 has added the triples of the ql:has-pattern predicate to the PSO&POS permutations. Now use this information instead of the old patterns, which did cost a lot of RAM. We tried a few queries involving patterns and the speed is very similar to that of the previous implementation.

NOTE: This is an index-breaking change. The old .index.patterns file stored the ql:has-pattern predicate (for each subject its pattern) and the information which pattern consists of which predicates. Now the .index.patterns file only stores the latter information. The file size therefore is significantly reduced and no longer depends on the size of the dataset (but only on how many distinct patterns there are, typically few). For example, for Wikidata, the file size reduced from 17 GB to 2.8 GB. For UniProt, the reduction is from 152 GB (which does not fit into the RAM of our standard machines) to something very small (because UniProt is very regular and there are only very few distinct patterns).

…l entities.

…o work.

Next step: neither write nor read the old subject-to-pattern-matching.

Next step: Prepare a preliminary PR to let Hannah try it out on real world knowledge graphs.

TODO<joka921> update the ddate as soon as we know on which day we merge.

# Conflicts: # .github/workflows/code-coverage.yml # test/ExceptionHandlingTest.cpp # test/IndexTestHelpers.h

TODO Actually write them during CreatePermutations, and then also retrieve them during the pattern processing.

But all previous unit tests pass again.

TODO<joka921> Objects...

Missing piece (probably) During the index-Building we need an optional join to handle the `noPattern` case for objects that don't appear as subjects.

join in a batched fashion.

# Conflicts: # src/engine/idTable/CompressedExternalIdTable.h # src/engine/idTable/IdTable.h # src/engine/idTable/IdTableRow.h # src/index/IndexImpl.cpp # test/engine/idTable/CompressedExternalIdTableTest.cpp

…too many changes left.

# Conflicts: # src/index/IndexImpl.cpp

# Conflicts: # test/QueryPlannerTest.cpp

hannahbast

1-1 with Johannes, I am amazed at the amount of work (and that it works, of course)

hannahbast

Awesome, another major milestone taken!

sonarqubecloud · 2024-01-18T19:36:16Z

Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

16 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

So far, the method `Vocabulary::prefix_range` returned a range of indexes of words in the internal vocabulary that match a given prefix. This is now replaced by a method `Vocabulary::prefixRange`, which returns two ranges, one for the internal and one for the external vocabulary. This can easily be extended to more ranges when needed. Based on this, our efficient implementation for REGEX when the regular expression is a prefix now finds items in both the internal and the external vocabulary (so far: only in the internal vocabulary). On the side, identified and fixed a bug in the previous code, where the special predicates starting with `@` would not be found if `prefixes-external` matched `@`. As a conesequence, QLever now works as it should even with `"prefixes-external": [""]` (that is, everything expect QLever-internal predicates, ends up in the external vocabulary) in the `.settings.json` file. In conjunction with #1223, this makes it now possible to start a QLever server with very little RAM.

So far, the method `Vocabulary::prefix_range` returned a range of indexes of words in the internal vocabulary that match a given prefix. This is now replaced by a method `Vocabulary::prefixRange`, which returns two ranges, one for the internal and one for the external vocabulary. This can easily be extended to more ranges when needed. Based on this, our efficient implementation for `REGEX` when the regular expression is a prefix, now finds items in both the internal and the external vocabulary (so far: only in the internal vocabulary). On the side, identified and fixed a bug in the previous code, where the special predicates starting with `@` would not be found if `prefixes-external` matched `@`. As a consequence, QLever now works as it should even with `"prefixes-external": [""]`, that is, even when everything except the QLever-internal predicates ends up in the external vocabulary. In conjunction with #1223, this makes it now possible to start a QLever server with very little RAM.

joka921 added 30 commits September 6, 2023 20:38

Not yet working.

095bdd3

The normal pattern trick is working, next do the pattern trick for al…

2470c0c

…l entities.

Full pattern trick also works.

ffe16aa

Throwing out the has-predicate-scan, because all the E2E-tests seem t…

29cc94b

…o work.

Completely threw out the unneded code from the has-predicate-scan.

256f17d

Next step: neither write nor read the old subject-to-pattern-matching.

Down with the RAM usage!

1805ee5

Next step: Prepare a preliminary PR to let Hannah try it out on real world knowledge graphs.

Cleaner handling of the special IDs.

98ab8a5

Bump the index format version.

c401367

TODO<joka921> update the ddate as soon as we know on which day we merge.

Fix the OpenMP bugs.

d02acee

Several improvements from a self-review.

e115e19

Merge branch 'master' into patterns-on-disk

e4e7bd3

A small fix etc.

5ab2a53

Commented out the failing tests to make codecov active.

fcb20fc

Show the memory usage of the failing codecov runner.

5cebbe2

Try to fix the Codecov OOM problems.

b45678b

stupidity

1d4f536

Merge branch 'master' into patterns-on-disk

d849b75

# Conflicts: # .github/workflows/code-coverage.yml # test/ExceptionHandlingTest.cpp # test/IndexTestHelpers.h

Merge in the current master

62aed1e

Prepare a lot of code for theactual storing of the patterns.

065e2c3

TODO Actually write them during CreatePermutations, and then also retrieve them during the pattern processing.

Added functionality (untested yet) to export additional columns.

db08fae

But all previous unit tests pass again.

The subject based patterns already seem to work like a charm.

960e32b

TODO<joka921> Objects...

Stopping for today.

ec1d230

Missing piece (probably) During the index-Building we need an optional join to handle the `noPattern` case for objects that don't appear as subjects.

This might work, but now we first let a DBLP build run.

96e46fe

This seems to work and answer simple queries....

ac1407b

Fix a subtle bug.

bea4c59

Trying to do the

9617343

join in a batched fashion.

Trying to do the

09fa62f

join in a batched fashion.

Add the ability to store additional columns in the relations.

5aa272f

Before a review.

e98b7cf

Add tests and clean up some code.

2a7b1d2

joka921 added 12 commits January 17, 2024 13:32

A round of reviews with Hannah.

174ea90

Some additional small reviews.

4a9bf23

Merge branch 'additional-permutations' into use-new-patterns

5dc7515

# Conflicts: # src/engine/idTable/CompressedExternalIdTable.h # src/engine/idTable/IdTable.h # src/engine/idTable/IdTableRow.h # src/index/IndexImpl.cpp # test/engine/idTable/CompressedExternalIdTableTest.cpp

The merge is still broken...

038cd0b

We still have to manually figure out the merge afterwards, there are …

bd0f86a

…too many changes left.

Merge branch 'master' into use-new-patterns

130b01e

# Conflicts: # src/index/IndexImpl.cpp

Revert to the old version.

4de17ad

Some refactorings of the CheckUsePatternTrick module.

70aecf3

Several additional things.

c9e3477

A round of self-reviews.

fa24da5

Several additional improvements and self-reviews.

394f379

Merge branch 'master' into use-new-patterns

e59ee45

# Conflicts: # test/QueryPlannerTest.cpp

hannahbast requested changes Jan 18, 2024

View reviewed changes

hannahbast marked this pull request as ready for review January 18, 2024 17:45

A round of reviews.

b3f7e38

hannahbast approved these changes Jan 18, 2024

View reviewed changes

hannahbast changed the title ~~Actually use the new pattern implementation (just a draft)~~ Use the new pattern implementation Jan 18, 2024

joka921 added 2 commits January 18, 2024 19:01

Moved underscores from the front to the back

3a9f8a5

Fix the date in the index version.

8dfd2ef

hannahbast approved these changes Jan 18, 2024

View reviewed changes

Change the date again.

52857c7

hannahbast approved these changes Jan 18, 2024

View reviewed changes

Rename the patternCreatorNew to PatternCreator again.

6313a71

hannahbast changed the title ~~Use the new pattern implementation~~ Use the patterns from the permutations and no longer load ql:has-pattern into RAM Jan 18, 2024

hannahbast merged commit d7635f0 into ad-freiburg:master Jan 18, 2024
17 of 18 checks passed

joka921 deleted the use-new-patterns branch January 19, 2024 07:39

hannahbast mentioned this pull request Jan 30, 2024

Prefix search now also considers the external vocabulary #1235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the patterns from the permutations and no longer load `ql:has-pattern` into RAM #1223

Use the patterns from the permutations and no longer load `ql:has-pattern` into RAM #1223

joka921 commented Jan 12, 2024 •

edited by hannahbast

Loading

hannahbast left a comment

hannahbast left a comment

sonarqubecloud bot commented Jan 18, 2024

Use the patterns from the permutations and no longer load ql:has-pattern into RAM #1223

Use the patterns from the permutations and no longer load ql:has-pattern into RAM #1223

Conversation

joka921 commented Jan 12, 2024 • edited by hannahbast Loading

hannahbast left a comment

Choose a reason for hiding this comment

hannahbast left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 18, 2024

Quality Gate passed

Use the patterns from the permutations and no longer load `ql:has-pattern` into RAM #1223

Use the patterns from the permutations and no longer load `ql:has-pattern` into RAM #1223

joka921 commented Jan 12, 2024 •

edited by hannahbast

Loading