Added a cumulative sum function to Histogram #29

xd009642 · 2019-02-26T21:45:57Z

This is equivalent to the cumsum for numpys histogram object. I've also added a basic test and a doc comment.

This is equivalent to the cumsum for numpys histogram object.

LukeMathWalker · 2019-02-26T21:57:37Z

How do you handle grids with 2 or more dimensions? It's not clear in that case how to proceed with a cumulative sum.
This seems to only make sense for 1-dimensional histograms 🤔

xd009642 · 2019-02-26T22:02:38Z

Let me go back to the drawing board codewise. I was however coming from the view of histogram equalisation in image processing where it will flatten the 2D image to get a 1D CDF.

xd009642 · 2019-02-26T22:09:16Z

Could add an Option<Axis> arg and do the CDF across one axis if present or for the flattened histogram if not present

jturner314 · 2019-02-27T02:07:22Z

Typically, the CDF for multiple variables is defined like this. So, each element in the cumulative sum array would be

// 1D case
        i
c[i] =  ∑  a[m]
       m=0

// 2D case
          i   j
c[i,j] =  ∑   ∑  a[m,n]
         m=0 n=0

// 3D case
            i   j   k
c[i,j,k] =  ∑   ∑   ∑  a[m,n,o]
           m=0 n=0 o=0

// etc.

Edit: Probably the easiest way to implement this for n dimensions is to do the cumulative sum along the first axis, then do the cumulative sum of that along the next axis, then do the cumulative sum of that along the next axis, etc.

xd009642 · 2019-02-27T22:24:44Z

Okay I've implemented something I think should work. I'd like to get rid of the to_owned as that was really to defeat the borrow-checker and add a test case to ensure it actually works for multiple dimensions. But logically it looks sound to me and the previous test still passes

xd009642 · 2019-02-28T19:04:02Z

Okay I'm happy with this as an initial version if one of you guys wants to have a gander 😄

LukeMathWalker · 2019-03-03T09:32:47Z

Currently on holiday in a sunny place, I'll have a look once I come back :P

LukeMathWalker · 2019-03-08T15:17:29Z

src/histogram/histograms.rs

@@ -78,6 +78,20 @@ impl<A: Ord> Histogram<A> {
    pub fn grid(&self) -> &Grid<A> {
        &self.grid
    }
+
+    /// Returns the cumulative distribution function of a histogram. 
+    /// Equivalent to the numpy histogram function cumsum


I'd like to have a more detailed docstring here: it should include the math formula to compute the value indexed i_1, ..., i_n in the final array and a simple example using a 1d/2d array.

Given that you are mentioning NumPy, it would be good to link directly to the docs for np.cumsum.

LukeMathWalker · 2019-03-08T15:18:18Z

src/histogram/histograms.rs

+        let mut cdf = self.counts.clone();
+        for i in 0..self.ndim() {
+            for j in 1..cdf.shape()[i] { 
+                let temp = cdf.index_axis(Axis(i), j - 1).to_owned();


It's a shame that we have to use to_owned here, due to the borrow checker, even though the two slices are not-overlapping 😞

Actually, this can be avoided using split_at!

So I just started trying to write that but found myself with two matrices of different sizes and thinking I'd like to get the last element of one for a given axis and the first of another and it felt too complicated.. But I might just not be picturing it in the right way 😕

The simplest approach would be to avoid the issue by using .lanes_mut(); something like this [untested]:

let mut cdf = self.counts.clone(); for ax in 0..cdf.ndim() { for lane in cdf.lanes_mut(Axis(ax)) { for i in 1..lane.len() { lane[i] += lane[i-1]; } } }

If that works @jturner314, it's surely cleaner. Unfortunately it kind of goes the way you described @xd009642 using split_at 🤔 You need to split at j and then grab the last lane of the first part and the first lane of the second part.

I just tested it, and my example does work (with the addition of mut before lane). It does rely on the element type being Copy, though; for general A you'd run into the same issue.

Fwiw, regarding the inconvenience of a solution based on .split_at(), the next release of ndarray will have a multislice! macro that makes it possible to cleanly take multiple disjoint, mutable slices simultaneously. This PR makes me realize that we should add a multislice_axis! macro too (see rust-ndarray/ndarray#593).

LukeMathWalker · 2019-03-08T15:24:55Z

The overall method is really a np.cumsum equivalent applied to the histogram matrix, so it kinda makes sense to have it as a cumsum method on ArrayBase instead of special-casing it for histograms.
Is this something ndarray might want to host @jturner314 or should we add it to one of ndarray-stats extension traits?

Irrespectively of where it is going to end up, it would be worth it to implement it as a free function taking an ArrayBase as input - then we can place it where it looks more convenient API-wise. This should also make it more straight-forward to write tests for the actual functionality you are implementing (without having to set up grids and histograms).

I have left some other minor comments @xd009642, but it looks good to me overall!

jturner314 · 2019-03-09T02:17:59Z

Yes, I think it makes sense to add cumsum and cumsum_axis to ndarray. That would also make it easier to justify additional code for optimizing iteration order and dealing with numerical precision issues (for floating point array elements).

xd009642 · 2019-03-09T09:00:21Z

Okay I'll transfer my work to ndarray and open a PR there 👍

xd009642 · 2019-03-09T21:41:10Z

@jturner314 I was looking at ndarray PRs and saw this one rust-ndarray/ndarray#513 it seems that it would probably be better to get that PR finished and merged in than start a new one that's less generic. Although I can continue with one since that one seems to have stalled

LukeMathWalker · 2019-03-12T08:08:27Z

I think it makes sense to file your PR in any case @xd009642 - when rust-ndarray/ndarray#513 lands/resumes activity we will take care of rephrasing cumsum in those terms, if possible.

Added a cumulative sum function to Histogram

1274003

This is equivalent to the cumsum for numpys histogram object.

Made CDF n-dimensional

cd17003

Added 2D CDF test for histogram

6e84659

xd009642 mentioned this pull request Mar 3, 2019

Histogram Equalisation rust-cv/ndarray-vision#2

Closed

LukeMathWalker reviewed Mar 8, 2019

View reviewed changes

LukeMathWalker closed this Dec 12, 2019

Added a cumulative sum function to Histogram #29

Added a cumulative sum function to Histogram #29

Uh oh!

Conversation

xd009642 commented Feb 26, 2019

Uh oh!

LukeMathWalker commented Feb 26, 2019

Uh oh!

xd009642 commented Feb 26, 2019

Uh oh!

xd009642 commented Feb 26, 2019

Uh oh!

jturner314 commented Feb 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xd009642 commented Feb 27, 2019

Uh oh!

xd009642 commented Feb 28, 2019

Uh oh!

LukeMathWalker commented Mar 3, 2019

Uh oh!

LukeMathWalker Mar 8, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Mar 8, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Mar 8, 2019

Choose a reason for hiding this comment

Uh oh!

xd009642 Mar 8, 2019

Choose a reason for hiding this comment

Uh oh!

jturner314 Mar 9, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Mar 9, 2019

Choose a reason for hiding this comment

Uh oh!

jturner314 Mar 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker commented Mar 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jturner314 commented Mar 9, 2019

Uh oh!

xd009642 commented Mar 9, 2019

Uh oh!

xd009642 commented Mar 9, 2019

Uh oh!

LukeMathWalker commented Mar 12, 2019

Uh oh!

Uh oh!

jturner314 commented Feb 27, 2019 •

edited

Loading

jturner314 Mar 9, 2019 •

edited

Loading

LukeMathWalker commented Mar 8, 2019 •

edited

Loading