-
Notifications
You must be signed in to change notification settings - Fork 28
Added a cumulative sum function to Histogram #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,6 +78,20 @@ impl<A: Ord> Histogram<A> { | |
pub fn grid(&self) -> &Grid<A> { | ||
&self.grid | ||
} | ||
|
||
/// Returns the cumulative distribution function of a histogram. | ||
/// Equivalent to the numpy histogram function cumsum | ||
pub fn cumulative_sum(&self) -> ArrayD<usize> { | ||
let mut cdf = self.counts.clone(); | ||
for i in 0..self.ndim() { | ||
for j in 1..cdf.shape()[i] { | ||
let temp = cdf.index_axis(Axis(i), j - 1).to_owned(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a shame that we have to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, this can be avoided using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I just started trying to write that but found myself with two matrices of different sizes and thinking I'd like to get the last element of one for a given axis and the first of another and it felt too complicated.. But I might just not be picturing it in the right way 😕 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The simplest approach would be to avoid the issue by using let mut cdf = self.counts.clone();
for ax in 0..cdf.ndim() {
for lane in cdf.lanes_mut(Axis(ax)) {
for i in 1..lane.len() {
lane[i] += lane[i-1];
}
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If that works @jturner314, it's surely cleaner. Unfortunately it kind of goes the way you described @xd009642 using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just tested it, and my example does work (with the addition of Fwiw, regarding the inconvenience of a solution based on |
||
let mut ax = cdf.index_axis_mut(Axis(i), j); | ||
ax += &temp; | ||
} | ||
} | ||
cdf | ||
} | ||
} | ||
|
||
/// Extension trait for `ArrayBase` providing methods to compute histograms. | ||
|
@@ -163,3 +177,39 @@ impl<A, S> HistogramExt<A, S> for ArrayBase<S, Ix2> | |
histogram | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod auto_tests { | ||
use super::*; | ||
use histogram::histograms::HistogramExt; | ||
use histogram::bins::{Bins, Edges}; | ||
|
||
#[test] | ||
fn histogram_cdf_1d() { | ||
let data = arr2(&[[1], [2], [3], [4], [1], [1], [4], [5], [8], [8], [8]]); | ||
let grid = Grid::from(vec![ | ||
Bins::new( | ||
Edges::from(vec![0,1,2,3,4,5,6,7,8,9]))]); | ||
|
||
let histogram = data.histogram(grid); | ||
//0, 1 2 3 4 5 6 7 8 | ||
let cdf = arr1(&[0, 3, 4, 5, 7, 8, 8, 8, 11]).into_dyn(); | ||
assert_eq!(histogram.cumulative_sum(), cdf); | ||
} | ||
|
||
#[test] | ||
fn histogram_cdf_2d() { | ||
let data = arr2(&[[0, 2], [4, 4], [1, 1], [3, 3], [0, 2]]); | ||
let grid = Grid::from(vec![ | ||
Bins::new(Edges::from(vec![0, 1, 2, 3, 4])), | ||
Bins::new(Edges::from(vec![0, 1, 2, 3, 4]))]); | ||
|
||
let histogram = data.histogram(grid); | ||
|
||
let cdf = arr2(&[[0, 0, 2, 2], | ||
[0, 1, 3, 3], | ||
[0, 1, 3, 3], | ||
[0, 1, 3, 4]]).into_dyn(); | ||
assert_eq!(histogram.cumulative_sum(), cdf); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a more detailed docstring here: it should include the math formula to compute the value indexed
i_1
, ...,i_n
in the final array and a simple example using a 1d/2d array.Given that you are mentioning NumPy, it would be good to link directly to the docs for
np.cumsum
.