-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel Matrix Operations #298
Comments
This would be perfect as a PR! I see no problem adding the Send+Sync bounds, this is for a matrix element which would be quite surprising to see as anything else. We will just have to add the trait bound where necessary, maybe add these bounds to |
This implementation isn't ideal but should be better in many sparse by dense products. In certain cases like when the right hand side is a dense vector this is actually still sequential. To optimize it might be necessary to make a parallel iterator of the It will probably also be worth building some more benchmarks to compare the difference but I'm not sure if |
I had need of parallel {CSR matrix} x { dense vector} mult, so I coded it using rayon. It was .... a nice speedup. Um, I'm [ETA: NOT] sufficiently experienced with Rust to know how to go about integrating this into sprs, but then again, the code is so tiny that I think it wouldn't be difficult to just rewrite it in the right way to integrate with what you've already done. Disclaimer: I really don't have a good idea of how to work with and extend traits, so I didn't even try to make this compatible with the existing traits, but I know that that is something that needs doing. Disc#2: the code isn't as efficient as it could be: viz. it could assign the results to the final locations, but instead concats a bunch of fragment-vectors. |
Currently only sparse by sparse products are parallel in the
smmp
module. Converting the current sparse by dense products usingndarray::parallel
should be straight forward. Here is an implementation forpar_csr_mulacc_dense_colmaj
that gives a significant speedup on my machine:The only changes here are the parallel iterator, adding the
rayon
feature forndarray
, and adding theSync
andSend
trait bounds to the data types inside the matrices. My concern is that addingSend + Sync
will result in these trait requirements to be unnecessarily added in many places.Looking at the
impl Mul
forCsMatBase
andCsMatBase
I see thatSync + Send
is required no matter ifmulti_thread
is enabled or not. Is it okay to propagate these trait requirements all the way up to many of the traitimpl
s forCsMatBase
and then use the conditional feature compilation on the lowest level functions found in theprod
module? Conditionally compiling at all the higher level implementations sounds like it would get nasty very quickly, especially as more parallel features get added.The text was updated successfully, but these errors were encountered: