Skip to content

Added a cumulative sum function to Histogram #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions src/histogram/histograms.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,20 @@ impl<A: Ord> Histogram<A> {
pub fn grid(&self) -> &Grid<A> {
&self.grid
}

/// Returns the cumulative distribution function of a histogram.
/// Equivalent to the numpy histogram function cumsum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a more detailed docstring here: it should include the math formula to compute the value indexed i_1, ..., i_n in the final array and a simple example using a 1d/2d array.

Given that you are mentioning NumPy, it would be good to link directly to the docs for np.cumsum.

pub fn cumulative_sum(&self) -> ArrayD<usize> {
let mut cdf = self.counts.clone();
for i in 0..self.ndim() {
for j in 1..cdf.shape()[i] {
let temp = cdf.index_axis(Axis(i), j - 1).to_owned();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that we have to use to_owned here, due to the borrow checker, even though the two slices are not-overlapping 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this can be avoided using split_at!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I just started trying to write that but found myself with two matrices of different sizes and thinking I'd like to get the last element of one for a given axis and the first of another and it felt too complicated.. But I might just not be picturing it in the right way 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest approach would be to avoid the issue by using .lanes_mut(); something like this [untested]:

let mut cdf = self.counts.clone();
for ax in 0..cdf.ndim() {
    for lane in cdf.lanes_mut(Axis(ax)) {
        for i in 1..lane.len() {
            lane[i] += lane[i-1];
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that works @jturner314, it's surely cleaner. Unfortunately it kind of goes the way you described @xd009642 using split_at 🤔 You need to split at j and then grab the last lane of the first part and the first lane of the second part.

Copy link
Member

@jturner314 jturner314 Mar 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it, and my example does work (with the addition of mut before lane). It does rely on the element type being Copy, though; for general A you'd run into the same issue.

Fwiw, regarding the inconvenience of a solution based on .split_at(), the next release of ndarray will have a multislice! macro that makes it possible to cleanly take multiple disjoint, mutable slices simultaneously. This PR makes me realize that we should add a multislice_axis! macro too (see rust-ndarray/ndarray#593).

let mut ax = cdf.index_axis_mut(Axis(i), j);
ax += &temp;
}
}
cdf
}
}

/// Extension trait for `ArrayBase` providing methods to compute histograms.
Expand Down Expand Up @@ -163,3 +177,39 @@ impl<A, S> HistogramExt<A, S> for ArrayBase<S, Ix2>
histogram
}
}

#[cfg(test)]
mod auto_tests {
use super::*;
use histogram::histograms::HistogramExt;
use histogram::bins::{Bins, Edges};

#[test]
fn histogram_cdf_1d() {
let data = arr2(&[[1], [2], [3], [4], [1], [1], [4], [5], [8], [8], [8]]);
let grid = Grid::from(vec![
Bins::new(
Edges::from(vec![0,1,2,3,4,5,6,7,8,9]))]);

let histogram = data.histogram(grid);
//0, 1 2 3 4 5 6 7 8
let cdf = arr1(&[0, 3, 4, 5, 7, 8, 8, 8, 11]).into_dyn();
assert_eq!(histogram.cumulative_sum(), cdf);
}

#[test]
fn histogram_cdf_2d() {
let data = arr2(&[[0, 2], [4, 4], [1, 1], [3, 3], [0, 2]]);
let grid = Grid::from(vec![
Bins::new(Edges::from(vec![0, 1, 2, 3, 4])),
Bins::new(Edges::from(vec![0, 1, 2, 3, 4]))]);

let histogram = data.histogram(grid);

let cdf = arr2(&[[0, 0, 2, 2],
[0, 1, 3, 3],
[0, 1, 3, 3],
[0, 1, 3, 4]]).into_dyn();
assert_eq!(histogram.cumulative_sum(), cdf);
}
}