Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transitive path computation with GraphBLAS #1252

Closed
wants to merge 41 commits into from

Conversation

JoBuRo
Copy link
Contributor

@JoBuRo JoBuRo commented Jan 30, 2024

This PR changes the implementation of the TransitivePath class.
Previously it used DFS to compute the transitive hull. The new implementation uses sparse, boolean matrices instead. The matrix object and functionalities come from the SuiteSparse:GraphBLAS library [website | github] .
This library is added as new dependency. The new class GrbMatrix wraps the library for this application.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a first pass over the Matrix class, haven't looked at the Transitive Path and the tests yet.

@@ -41,6 +41,7 @@ jobs:
run: |
brew install llvm@16
brew install conan@2
brew install suite-sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently doesn't seem to work,
I will try to figure out how to do this on our Mac.
What we also could do is to compile GraphBLAS alongside QLever, but only if
it is explicitly requested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(of course this would increase build times, I'd have to check if this is feasible, and probably it would be problematic for CI builds.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass over the rest!

bool isContained(Id id) { return idMap_.contains(id); }

size_t addId(Id id) {
if (!idMap_.contains(id)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this nicer and more efficient by using

  1. idMap_.try_emplace().
  2. The fact the nextIndex_ is always equal to idMap_.size() (the explicit data member is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and equal to indexMap.size).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm using the try_emplace as you intended, but I think still need to check if idMap_ contains id. If the id is not contained I need to add to the vector indexMap_.

Comment on lines 64 to 66
ad_utility::HashMap<Id, size_t> idMap_{};

std::vector<Id> indexMap_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make them private and include memory limits via the AllocatorWithMemoryLimit (can be applied everywhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using ad_utility::HashMapWithMemoryLimit<Id, size_t>, but on compile I get:

error: call to deleted constructor of 'allocator_type' (aka 'ad_utility::AllocatorWi thLimit<std::pair<const ValueId, unsigned long>>')

rowIndex++;
table(resultRowIndex, startSideCol) = startNode;
table(resultRowIndex, targetSideCol) = targetNode;
resultRowIndex++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a look at the possibilities of GraphBlas to check what is the most efficient way to get a set of rows out.

size_t targetIndex) const {
GrbMatrix transformer = GrbMatrix(hull.ncols(), hull.ncols());
transformer.setElement(targetIndex, targetIndex, true);
return hull.multiply(transformer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't actually this the solution to get multiple rows at once if you have multiple start indices? Just with a transformer that has more than one entry?
That way you don't have that many getRow() calls, but a single "getAllRows" followed by an extraction.

Comment on lines 496 to 503
for (size_t i = 0; i < numRows; i++) {
auto startId = startCol[i];
auto targetId = targetCol[i];
auto startIndex = mapping.addId(startId);
auto targetIndex = mapping.addId(targetId);

rowIndices.push_back(startIndex);
colIndices.push_back(targetIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both your input and output format are column based, write a lambda that handles one of the columns, and call it twice. (Maybe this can even be a member function addAll of the mapping.
And in general: If you know the needed size of a vector in advance, you should reserve it.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to increase the visibility of the fallback.

// A small helper function: Insert the `value` to the set at `map[key]`.
// As the sets all have an allocator with memory limit, this construction is a
// little bit more involved, so this can be a separate helper function.
void insertIntoMap(Map& map, Id key, Id value) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.
I think we should definitely split the two implementations.
I think we can just use basic inheritance:

  1. A class TransitivePathBase that has all the stuff that the class currently has EXCEPT for the override of computeResult() + all the helper functions needed for it.

  2. One class TransitivePathGraphblas and one class TransitivePathFallback that both inherit from the base and implement their repsecive computeResults.

  3. : The base class has a function static shared_pr<TransitivePathBase> makeTransitivePath(..... all the args, bool use fallback). (Trick: only declare it in the header, and implement it in the .cpp` file, in which you can include both implementations.

JoBuRo and others added 15 commits February 1, 2024 14:21
- Renamed nrows, ncols and nvals
- Moved code to cpp file
- Nullptr is now valid for GrbMatrix::matrix_
- Renamed copy() -> clone()
- Functions that create a GrbMatrix now use the unique_ptr directly
- Remove nextIndex_
- Made internal datastructures private
- isContained -> contains
- const Id& -> Id
- Remove C style arrays
- Add checkCancellation
Copy link

codecov bot commented Feb 7, 2024

Codecov Report

Attention: Patch coverage is 67.80000% with 161 lines in your changes are missing coverage. Please review.

Project coverage is 86.94%. Comparing base (bcc05cc) to head (714744a).

❗ Current head 714744a differs from pull request most recent head f6e2429. Consider uploading reports for the commit f6e2429 to get more accurate results

Files Patch % Lines
src/engine/TransitivePath.cpp 51.02% 111 Missing and 9 partials ⚠️
src/engine/GrbMatrix.cpp 79.08% 21 Missing and 20 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1252      +/-   ##
==========================================
- Coverage   88.10%   86.94%   -1.16%     
==========================================
  Files         310      310              
  Lines       28436    28410      -26     
  Branches     3148     3168      +20     
==========================================
- Hits        25053    24701     -352     
- Misses       2231     2552     +321     
- Partials     1152     1157       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Removed unnecessary unique_ptr and make_optionals
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
9.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@joka921
Copy link
Member

joka921 commented Apr 18, 2024

We have now merged the better implementation that uses a binary search,
so we can then close this PR.

@joka921 joka921 closed this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants