Support nullptr value of argument ptr_C for xe_array_epilogue#541
Open
sanchitintel wants to merge 5 commits intointel:mainfrom
Open
Support nullptr value of argument ptr_C for xe_array_epilogue#541sanchitintel wants to merge 5 commits intointel:mainfrom
nullptr value of argument ptr_C for xe_array_epilogue#541sanchitintel wants to merge 5 commits intointel:mainfrom
Conversation
Author
|
Hi @rolandschulz, in the main branch, for vanilla GEMM, a This PR fixes that issue. Thanks! |
rolandschulz
left a comment
There was a problem hiding this comment.
need to check the API being consistent with Nvidia. Otherwise LGTM.
Co-authored-by: rolandschulz <roland.schulz@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In a real-use case of GroupGEMM, we may not have any
Cmatrices. Whenbetais 0, passing a bogusptr_Cargument forxe_array_epilogue, such asptr_D, if they are the same dtype, still results in some wasteful compute, so for the default Group GEMM example (examples/04_bmg_grouped_gemm.cpp), there's a slowdown of ~0.1 TFLOPs.Solution
Preferable solution is to use
beta=0& a non-null value forptr_C, such asptr_D, e.g.static_cast<const ElementC**>((void*)ptr_D.get())even whenC&Dare not the same dtype becauseCtiles aren't actually used whenbetais0.However, in this implementation in this PR, if a user still wants to pass
nullptrargument forptr_C, thenCshould not be used irrespective of whatever thebetavalue is.Given that the aforementioned workaround doesn't result in a discernible performance penalty, I'm not sure if this PR makes sense.