-
Notifications
You must be signed in to change notification settings - Fork 18
Update dependencies #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependencies #130
Conversation
What's the motivation behind the change between using a wrapper and using directly using ndarray API? |
Not having functions that just transfer the call to ndarray.
I see two reason for having it: we have a custom function that is not in ndarray (resize_if_different), and we might want to include more changes in the future. Specifically, a nice way to implement #2 would be to use range dimension (an array with shape |
@@ -73,28 +59,13 @@ impl<T: Zero + Clone> Array2<T> { | |||
/// a.resize_if_different((8, 9)); | |||
/// assert_eq!(a[(3, 3)], 0.0); | |||
/// ``` | |||
pub fn resize_if_different(&mut self, size: (Ix, Ix)) { | |||
if self.0.shape() != &[size.0, size.1] { | |||
pub fn resize_if_different(&mut self, size: (usize, usize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a comment on this PR, but this behavior is not what is expected given the name of the function. I do not expect a resize
operation to overwrite values that are contained in the new size, I expect something in the line of a std::vector.resize()
in C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, but I can't think of a better name right now.
Giving this function the std::vector::resize
behaviour could also be a possibility, by round-triping to a Vec
:
let vec = self.into_vec();
vec.resize(size.0 * size.1);
*self = Array::from_vec(vec, size);
I don't know whether we should rename this function or change the behaviour. Do you have any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing with your solution is that it does not preserve the content of the array (because of the multi-dimensional aspect if the size of an axis that is not the zero axis changes, all the indexes will be offset), which is not a big deal in the current use cases but again suffers form the association with the resize
for the stdlib.
In the cases where we use it, wouldn't it be simpler to just not use it ?
For example here having instead the following seems OK:
self.pairs_cache = Array2::zeros((system.size(), system.size()));
Or, if the time for initializing all the values becomes an issue, have an unsafe function create an uninitialized array, since in both use cases we do not care about the values inside the array and overwrite every element.
I didn't phrase my question as I wanted to : not having functions that just transfer the call to ndarray was already an point for making it |
It was not previously |
Continuing the discussion about hidden I agree that we should not hide the fact that indexing these array is
We don't hide anything to the compiler, but we do hide it from developers. To me, the question is how to balance explicitness and convenience. Current code: struct Ewald {
fourier_phases: Array2<f64>,
rho: Array2<f64>,
// ...
}
// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
for i in 0..natoms {
for j in 0..3 {
self.fourier_phases[(k, i, j)] = self.fourier_phases[(k - 1, i, j)]
* self.fourier_phases[(1, i, j)];
}
}
}
for ikx in 0..self.kmax {
for iky in 0..self.kmax {
for ikz in 0..self.kmax {
self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
for j in 0..natoms {
let phi = self.fourier_phases[(ikx, j, 0)]
* self.fourier_phases[(iky, j, 1)]
* self.fourier_phases[(ikz, j, 2)];
self.rho[(ikx, iky, ikz)] = self.rho[(ikx, iky, ikz)] + system[j].charge * phi;
}
}
}
}
} Case 1: struct Ewald {
fourier_phases: UnsafeArray2<f64>,
rho: UnsafeArray2<f64>,
// ...
}
// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
for i in 0..natoms {
for j in 0..3 {
self.fourier_phases[(k, i, j)] = self.fourier_phases[(k - 1, i, j)]
* self.fourier_phases[(1, i, j)];
}
}
}
for ikx in 0..self.kmax {
for iky in 0..self.kmax {
for ikz in 0..self.kmax {
self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
for j in 0..natoms {
let phi = self.fourier_phases[(ikx, j, 0)]
* self.fourier_phases[(iky, j, 1)]
* self.fourier_phases[(ikz, j, 2)];
self.rho[(ikx, iky, ikz)] = self.rho[(ikx, iky, ikz)] + system[j].charge * phi;
}
}
}
}
} Here the unsafety is a bit more visible, but only in the declaration of the struct, not where we are indexing the arrays. Case 2: struct Ewald {
fourier_phases: Array2<f64>,
rho: Array2<f64>,
// ...
}
// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
for i in 0..natoms {
for j in 0..3 {
unsafe {
self.fourier_phases.at_mut((k, i, j)) = self.fourier_phases.at((k - 1, i, j))
* self.fourier_phases.at((1, i, j));
}
}
}
}
for ikx in 0..self.kmax {
for iky in 0..self.kmax {
for ikz in 0..self.kmax {
self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
for j in 0..natoms {
unsafe {
let phi = self.fourier_phases.at((ikx, j, 0))
* self.fourier_phases.at((iky, j, 1))
* self.fourier_phases.at((ikz, j, 2));
self.rho.at_mut((ikx, iky, ikz)) = self.rho.at((ikx, iky, ikz)) + system[j].charge * phi;
}
}
}
}
}
} For the second proposal, we'll need both an I am not really convinced by either solution. |
Hey, cool that you are using ndarray, I noticed after talking to you about benchmarking. I'm also kind of happy you're using the fact that uget/uget_mut are debug-checked. It is an inevitability that we want to avoid bounds checks in inner loops but at least the debug checking affords us some better guarantees about the program. In ndarray I'd also of course encourage you to write code that doesn't use indices at all, if possible (example of that). ndarray discussion place is at rust-ndarray/ndarray#293 and any suggestions are welcome there. We are looking at another major version soon. It's mostly about “technically breaking” and changing a part here and there, not about breaking existing code. No revolution. If you have the stamina I'd go with encapsulated ndarray so that the versioning story is easier. But at some point you might want to provide the full API to your users. Some notes: Indexing syntax uses derefs if possible. A single Index impl on Array2 is enough to block all index-dereffing. (This may be what you want). I guess you are aware, but the reason .slice() in ndarray is not using indexing is because Here's an idea off the top of my mind for visible but convenient release-unchecked indexing: use lumol::Unchecked as U;
self.rho[U(ikx, iky, ikz)] = self.rho[U(ikx, iky, ikz)] + system[U(j)].charge * phi; |
Hey, thank you for stepping in!
I am a bit hesitant concerning this point, because I found that in a lot of cases, the functional version of the code is less readable than the indexed version. By this I mean that to understand the functional version, you need to know all the functions involved and what they do. And this means that a lot of chemist will not understand the code, because they are only used to Fortran/C code with explicit indexing and looping. Given that one goal for this project is to be easy to modify, I think using explicit indexing is better, even if this means we need to think a bit more about bounds checking.
This looks nice, thanks for the proposition! |
I am heavily in favor of case 2 and having the explicit
|
This is unfortunately not easy do to performantly, in my experience. The problem as I understand it is that the iterator only has one Linear (single dimensional) iteration doesn't have this problem. (This is why regular iterators are often not good enough in ndarray and why we have other solutions like Zip, or special case |
Thanks for the insight ! Also in this case this is the very outer loop, so there is a hope that is does not impact performance too much (I honestly do not know, but trying won't hurt). |
@Luthaf the discussions in this PR, while related to this PR do not originate from it, so if you don't have anything to add to this particular PR I'm OK to merge. |
Yes, let's merge this. Should we continue the discussion here or on a separated issue? |
Maybe create an issue for the unsafe indexing and one for the |
The two big changes here concern the ndarray and TOML crates.
Our ndarray wrapper now implement
Deref<Target=ndarray::Array>
, so that we can directly use ndarray functions. This means that our versioning is tied with ndarray updates (a non-semver compatible update in ndarray means a non-semver compatible release for lumol). This allow to remove functions that are just wrappers of ndarray, and use the ndarray functions directly.The TOML update was easier, we even get better error messages on invalid TOML files.