Update dependencies #130

Luthaf · 2017-03-29T14:31:02Z

The two big changes here concern the ndarray and TOML crates.

Our ndarray wrapper now implement Deref<Target=ndarray::Array>, so that we can directly use ndarray functions. This means that our versioning is tied with ndarray updates (a non-semver compatible update in ndarray means a non-semver compatible release for lumol). This allow to remove functions that are just wrappers of ndarray, and use the ndarray functions directly.

The TOML update was easier, we even get better error messages on invalid TOML files.

antoinewdg · 2017-03-29T19:59:24Z

What's the motivation behind the change between using a wrapper and using directly using ndarray API?
Also, do we still need the wrapper struct?

Luthaf · 2017-03-29T21:45:47Z

What's the motivation behind the change between using a wrapper and using directly using ndarray API?

Not having functions that just transfer the call to ndarray.

Also, do we still need the wrapper struct?

I see two reason for having it: we have a custom function that is not in ndarray (resize_if_different), and we might want to include more changes in the future. Specifically, a nice way to implement #2 would be to use range dimension (an array with shape 3, -8..8, 10 for example) and negative indexing (the loops should run from -kmax to kmax). The wrapper allow to make the changes at one only point in the code.

antoinewdg · 2017-04-01T08:54:40Z

src/core/src/types/arrays.rs

@@ -73,28 +59,13 @@ impl<T: Zero + Clone> Array2<T> {
    /// a.resize_if_different((8, 9));
    /// assert_eq!(a[(3, 3)], 0.0);
    /// ```
-    pub fn resize_if_different(&mut self, size: (Ix, Ix)) {
-        if self.0.shape() != &[size.0, size.1] {
+    pub fn resize_if_different(&mut self, size: (usize, usize)) {


This is not really a comment on this PR, but this behavior is not what is expected given the name of the function. I do not expect a resize operation to overwrite values that are contained in the new size, I expect something in the line of a std::vector.resize() in C++

I agree with you, but I can't think of a better name right now.

Giving this function the std::vector::resize behaviour could also be a possibility, by round-triping to a Vec:

let vec = self.into_vec(); vec.resize(size.0 * size.1); *self = Array::from_vec(vec, size);

I don't know whether we should rename this function or change the behaviour. Do you have any preference?

The thing with your solution is that it does not preserve the content of the array (because of the multi-dimensional aspect if the size of an axis that is not the zero axis changes, all the indexes will be offset), which is not a big deal in the current use cases but again suffers form the association with the resize for the stdlib.

In the cases where we use it, wouldn't it be simpler to just not use it ?
For example here having instead the following seems OK:

self.pairs_cache = Array2::zeros((system.size(), system.size()));

Or, if the time for initializing all the values becomes an issue, have an unsafe function create an uninitialized array, since in both use cases we do not care about the values inside the array and overwrite every element.

antoinewdg · 2017-04-01T08:56:46Z

I didn't phrase my question as I wanted to : not having functions that just transfer the call to ndarray was already an point for making it Deref in the first place, but that's not what happened, what made the balance tip in favor of it now ?

Luthaf · 2017-04-01T11:51:13Z

It was not previously Deref, because I did not knew how this trait worked, nothing more =)

Luthaf · 2017-04-01T12:14:50Z

Continuing the discussion about hidden unsafe that was started here. The Array2 and Array3 use unchecked indexing with unsafe, and @antoinewdg is concerned about hiding unsafe here.$

I agree that we should not hide the fact that indexing these array is unsafe, and we have proposed two possible solutions:

Rename Array2 to UnsafeArray2 and Array3 to UnsafeArray3, but keep the indexing 'safe'.
Remove the unchecked access with Index, and use an unsafe at method

I just really don't want to hide from the compiler that we are making unsafe stuff.
-- #130 (comment)

We don't hide anything to the compiler, but we do hide it from developers. To me, the question is how to balance explicitness and convenience.

Current code:

struct Ewald {
    fourier_phases: Array2<f64>,
    rho: Array2<f64>,
    // ...
}

// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
    for i in 0..natoms {
        for j in 0..3 {
            self.fourier_phases[(k, i, j)] = self.fourier_phases[(k - 1, i, j)]
                                           * self.fourier_phases[(1, i, j)];
        }
    }
}

for ikx in 0..self.kmax {
    for iky in 0..self.kmax {
        for ikz in 0..self.kmax {
            self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
                for j in 0..natoms {
                    let phi = self.fourier_phases[(ikx, j, 0)] 
							* self.fourier_phases[(iky, j, 1)] 
							* self.fourier_phases[(ikz, j, 2)];
                    self.rho[(ikx, iky, ikz)] = self.rho[(ikx, iky, ikz)] + system[j].charge * phi;
                }
            }
        }
    }
}

Case 1:

struct Ewald {
    fourier_phases: UnsafeArray2<f64>,
    rho: UnsafeArray2<f64>,
    // ...
}

// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
    for i in 0..natoms {
        for j in 0..3 {
            self.fourier_phases[(k, i, j)] = self.fourier_phases[(k - 1, i, j)]
                                           * self.fourier_phases[(1, i, j)];
        }
    }
}

for ikx in 0..self.kmax {
    for iky in 0..self.kmax {
        for ikz in 0..self.kmax {
            self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
                for j in 0..natoms {
                    let phi = self.fourier_phases[(ikx, j, 0)] 
							* self.fourier_phases[(iky, j, 1)] 
							* self.fourier_phases[(ikz, j, 2)];
                    self.rho[(ikx, iky, ikz)] = self.rho[(ikx, iky, ikz)] + system[j].charge * phi;
                }
            }
        }
    }
}

Here the unsafety is a bit more visible, but only in the declaration of the struct, not where we are indexing the arrays.

Case 2:

struct Ewald {
    fourier_phases: Array2<f64>,
    rho: Array2<f64>,
    // ...
}

// Use recursive definition for computing the factor for all the other values of k.
for k in 2..self.kmax {
    for i in 0..natoms {
        for j in 0..3 {
            unsafe {
                self.fourier_phases.at_mut((k, i, j)) = self.fourier_phases.at((k - 1, i, j))
                                                      * self.fourier_phases.at((1, i, j));
            }
        }
    }
}

for ikx in 0..self.kmax {
    for iky in 0..self.kmax {
        for ikz in 0..self.kmax {
            self.rho[(ikx, iky, ikz)] = Complex::polar(0.0, 0.0);
                for j in 0..natoms {
                    unsafe {
                        let phi = self.fourier_phases.at((ikx, j, 0)) 
								* self.fourier_phases.at((iky, j, 1)) 
								* self.fourier_phases.at((ikz, j, 2));
                        self.rho.at_mut((ikx, iky, ikz)) = self.rho.at((ikx, iky, ikz)) + system[j].charge * phi;
                    }
                }
            }
        }
    }
}

For the second proposal, we'll need both an at function and an at_mut function. It also introduce a supplementary right drift in this part of the code which is already quite indented.

I am not really convinced by either solution.

bluss · 2017-04-01T12:43:56Z

Hey, cool that you are using ndarray, I noticed after talking to you about benchmarking.

I'm also kind of happy you're using the fact that uget/uget_mut are debug-checked. It is an inevitability that we want to avoid bounds checks in inner loops but at least the debug checking affords us some better guarantees about the program. In ndarray I'd also of course encourage you to write code that doesn't use indices at all, if possible (example of that).

ndarray discussion place is at rust-ndarray/ndarray#293 and any suggestions are welcome there. We are looking at another major version soon. It's mostly about “technically breaking” and changing a part here and there, not about breaking existing code. No revolution.

If you have the stamina I'd go with encapsulated ndarray so that the versioning story is easier. But at some point you might want to provide the full API to your users.

Some notes:

Indexing syntax uses derefs if possible. A single Index impl on Array2 is enough to block all index-dereffing. (This may be what you want).

I guess you are aware, but the reason .slice() in ndarray is not using indexing is because Index/IndexMut traits must return references. Not for example something like ArrayView.

Here's an idea off the top of my mind for visible but convenient release-unchecked indexing:

use lumol::Unchecked as U;

self.rho[U(ikx, iky, ikz)] = self.rho[U(ikx, iky, ikz)] + system[U(j)].charge * phi;

Luthaf · 2017-04-01T13:09:24Z

Hey, thank you for stepping in!

In ndarray I'd also of course encourage you to write code that doesn't use indices at all, if possible (example of that).

I am a bit hesitant concerning this point, because I found that in a lot of cases, the functional version of the code is less readable than the indexed version. By this I mean that to understand the functional version, you need to know all the functions involved and what they do. And this means that a lot of chemist will not understand the code, because they are only used to Fortran/C code with explicit indexing and looping. Given that one goal for this project is to be easy to modify, I think using explicit indexing is better, even if this means we need to think a bit more about bounds checking.

Here's an idea off the top of my mind for visible but convenient release-unchecked indexing:

This looks nice, thanks for the proposition!

antoinewdg · 2017-04-02T10:15:26Z

I am heavily in favor of case 2 and having the explicit unsafe. A few notes on readability:

we can make an at function specifically for Array2 and Array3 so that we do not have to create the tuple, this would remove some parenthesis
if right drift is a concern, we can factor the three level ik loop in a for (ikx, iky, ikz) in ..., we just would need to be careful it doesn't impact performances

bluss · 2017-04-02T12:50:13Z

if right drift is a concern, we can factor the three level ik loop in a for (ikx, iky, ikz) in ..., we just would need to be careful it doesn't impact performances

This is unfortunately not easy do to performantly, in my experience.

The problem as I understand it is that the iterator only has one next method. To simulate going through three levels of loops, it needs to make kind of a state machine. To optimize it like three nested for loops, the compiler must unravel all that somehow (and it doesn't manage).

Linear (single dimensional) iteration doesn't have this problem.

(This is why regular iterators are often not good enough in ndarray and why we have other solutions like Zip, or special case .fold() on our iterators. fold is internal iteration and then we can circumvent the problem.)

antoinewdg · 2017-04-02T13:23:14Z

Thanks for the insight !
If we just generate an array with all the possible indexes (in our case it shouldn't be too big), and then iterate over this array, could this mitigate the problem ?

Also in this case this is the very outer loop, so there is a hope that is does not impact performance too much (I honestly do not know, but trying won't hurt).

antoinewdg · 2017-04-02T17:42:47Z

@Luthaf the discussions in this PR, while related to this PR do not originate from it, so if you don't have anything to add to this particular PR I'm OK to merge.

Luthaf · 2017-04-02T17:55:25Z

Yes, let's merge this. Should we continue the discussion here or on a separated issue?

antoinewdg · 2017-04-02T18:26:25Z

Maybe create an issue for the unsafe indexing and one for the resize_if_different ?

Guillaume Fraux added 2 commits March 29, 2017 15:41

Update lumol-core dependencies

55a8078

Update TOML to 0.3

d5cb582

Luthaf force-pushed the update branch from cebc398 to d5cb582 Compare March 29, 2017 14:36

antoinewdg reviewed Apr 1, 2017

View reviewed changes

Luthaf mentioned this pull request Apr 1, 2017

Improve performance on the Ewald forces #121

Merged

antoinewdg merged commit 805fe93 into master Apr 2, 2017

Luthaf mentioned this pull request Apr 4, 2017

Mark types::Array indexing as unsafe #136

Open

Luthaf deleted the update branch April 22, 2017 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependencies #130

Update dependencies #130

Luthaf commented Mar 29, 2017

antoinewdg commented Mar 29, 2017

Luthaf commented Mar 29, 2017

antoinewdg Apr 1, 2017

Luthaf Apr 1, 2017

antoinewdg Apr 2, 2017 •

edited

Loading

antoinewdg commented Apr 1, 2017

Luthaf commented Apr 1, 2017

Luthaf commented Apr 1, 2017

bluss commented Apr 1, 2017 •

edited

Loading

Luthaf commented Apr 1, 2017

antoinewdg commented Apr 2, 2017

bluss commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

Luthaf commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

Update dependencies #130

Update dependencies #130

Conversation

Luthaf commented Mar 29, 2017

antoinewdg commented Mar 29, 2017

Luthaf commented Mar 29, 2017

antoinewdg Apr 1, 2017

Choose a reason for hiding this comment

Luthaf Apr 1, 2017

Choose a reason for hiding this comment

antoinewdg Apr 2, 2017 • edited Loading

Choose a reason for hiding this comment

antoinewdg commented Apr 1, 2017

Luthaf commented Apr 1, 2017

Luthaf commented Apr 1, 2017

bluss commented Apr 1, 2017 • edited Loading

Luthaf commented Apr 1, 2017

antoinewdg commented Apr 2, 2017

bluss commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

Luthaf commented Apr 2, 2017

antoinewdg commented Apr 2, 2017

antoinewdg Apr 2, 2017 •

edited

Loading

bluss commented Apr 1, 2017 •

edited

Loading