Skip to content

[Fix] makeblascontractable should return blascontractable objects. #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lkdvos
Copy link
Collaborator

@lkdvos lkdvos commented Jun 9, 2025

I was doing some benchmarking with the StridedBLAS backend, and found some interesting results in the profiler: somehow I was hitting the Strided.__mul! generic implementation instead of the gemm one from BLAS, and it turns out that for conjugated inputs, the makeblascontractable function doesn't actually return a blascontractable object. The result is a slow fallback implementation instead of the actual BLAS call.

The current fix seems easy enough, although it's not the only option: in principle we could retain the A.op while permuting such that everything fuses and the second stride is 1, but I'm not sure if this would then make a lot of difference performance-wise?

I'm not sure if this is something we should add explicit tests for either. (It's not too hard to check that the output of makeblascontractable satisfies isblascontractable, which is also how I found this "bug").

@lkdvos lkdvos requested a review from Jutho June 9, 2025 19:41
@Jutho
Copy link
Owner

Jutho commented Jun 10, 2025

Wow this is bad indeed, the bug I mean. Strange that this has survived for so long. A test that isblascontractable(makeblascontractable(...)) for a number of different cases would be useful indeed.

Anew = SV(A_, size(A_), strides(A_), 0, A.op)
conjA = A.op === conj
A_ = tensoralloc_add(TC, A, pA, conjA, Val(true), allocator)
Anew = SV(A_)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think conjA is needed in the tensoralloc_add; that information is already in A (not that it matters for AbstractArray types). I think the main bug was that A.op was inserted back into the Anew = SV(...) call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants