-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int overflow when using cuda backend #287
Comments
On the first look, it seems like a support for 64-bit indices could be added by some small changes here:
and possibly some other places I missed. I guess these changes could be made dependent on the backend template parameters (similar to the builtin backend). But the real question is, as you noted, would the matrix fit into the GPU memory? If you use 64bit indices, the memory requrements will grow, and the supported problem size will be further reduced. AMGCL will only work if the whole problem fits into the GPU memory. If you want to solve a bigger system, you will need multiple GPUs, and use MPI (or may be a vexcl backend with multiple GPUs in the context). |
Thank you so much for the prompt response. I actually made these changes in cuda.hpp and it works! But yes, at this point the problem is memory, reaching greater than 80GB for the problems I need to work with. The interesting part is that the solver takes about 26GB but the preconditioner takes >100GB. I might play around with the preconditioner parameters or other preconditions, however, spai0 was the one giving the best performance if I recall correctly. |
If your changes are backward compatible (they won't break the older code using the cuda backend), then a PR is welcome. Otherwise, if you care to share the patch, I'll try to find some time and see if it is possible to make it backward compatible. |
I don't think they would break the older code but the only changes I made was changing the type of ptr and col to int64_t and CUSPARSE_INDEX32I to CUSPARSE_INDEX_64I. I'm still getting familiarized with GitHub so I'm not sure yet how to share this with you |
Hello,
I'm currently using AMGCL with cuda backend to solve Poisson's equation (results in a 7-point stencil, so approximately (N^3)*7 nnz). I have tested the problem with "small" matrix sizes (about N=100) and it works perfectly. However, when I get to sizes of about N=800 (nnz > 3 billion) the solver breaks because of int overflow. Creating the sparse matrix with the std::tuple method works just fine, since col and ptr are of type "ptrdiff_t". However, when doing the initial solve step (
Solver solve(A,prm,bprm);
) the code breaks because of indexing error. I have noticed that the cuda backend is set withCUSPARSE_INDEX_32I
but I know cuda supportsCUSPARSE_INDEX_64I
.This is my current setup:
and then I solve the system as:
Another thing I'm worried about is that the problem size is too large for the GPU to handle (GPU has a global memory size of 80GB). Is there a better way to handle these types of problems (maybe through VexCL). Any help is greatly appreciated. Apologies in advance as this is my first time posting on GitHub.
The text was updated successfully, but these errors were encountered: