-
Notifications
You must be signed in to change notification settings - Fork 28
Added a cumulative sum function to Histogram #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is equivalent to the cumsum for numpys histogram object.
How do you handle grids with 2 or more dimensions? It's not clear in that case how to proceed with a cumulative sum. |
Let me go back to the drawing board codewise. I was however coming from the view of histogram equalisation in image processing where it will flatten the 2D image to get a 1D CDF. |
Could add an |
Typically, the CDF for multiple variables is defined like this. So, each element in the cumulative sum array would be
Edit: Probably the easiest way to implement this for n dimensions is to do the cumulative sum along the first axis, then do the cumulative sum of that along the next axis, then do the cumulative sum of that along the next axis, etc. |
Okay I've implemented something I think should work. I'd like to get rid of the |
Okay I'm happy with this as an initial version if one of you guys wants to have a gander 😄 |
Currently on holiday in a sunny place, I'll have a look once I come back :P |
@@ -78,6 +78,20 @@ impl<A: Ord> Histogram<A> { | |||
pub fn grid(&self) -> &Grid<A> { | |||
&self.grid | |||
} | |||
|
|||
/// Returns the cumulative distribution function of a histogram. | |||
/// Equivalent to the numpy histogram function cumsum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a more detailed docstring here: it should include the math formula to compute the value indexed i_1
, ..., i_n
in the final array and a simple example using a 1d/2d array.
Given that you are mentioning NumPy, it would be good to link directly to the docs for np.cumsum
.
let mut cdf = self.counts.clone(); | ||
for i in 0..self.ndim() { | ||
for j in 1..cdf.shape()[i] { | ||
let temp = cdf.index_axis(Axis(i), j - 1).to_owned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame that we have to use to_owned
here, due to the borrow checker, even though the two slices are not-overlapping 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this can be avoided using split_at
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I just started trying to write that but found myself with two matrices of different sizes and thinking I'd like to get the last element of one for a given axis and the first of another and it felt too complicated.. But I might just not be picturing it in the right way 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest approach would be to avoid the issue by using .lanes_mut()
; something like this [untested]:
let mut cdf = self.counts.clone();
for ax in 0..cdf.ndim() {
for lane in cdf.lanes_mut(Axis(ax)) {
for i in 1..lane.len() {
lane[i] += lane[i-1];
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that works @jturner314, it's surely cleaner. Unfortunately it kind of goes the way you described @xd009642 using split_at
🤔 You need to split at j
and then grab the last lane of the first part and the first lane of the second part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested it, and my example does work (with the addition of mut
before lane
). It does rely on the element type being Copy
, though; for general A
you'd run into the same issue.
Fwiw, regarding the inconvenience of a solution based on .split_at()
, the next release of ndarray
will have a multislice!
macro that makes it possible to cleanly take multiple disjoint, mutable slices simultaneously. This PR makes me realize that we should add a multislice_axis!
macro too (see rust-ndarray/ndarray#593).
The overall method is really a Irrespectively of where it is going to end up, it would be worth it to implement it as a free function taking an I have left some other minor comments @xd009642, but it looks good to me overall! |
Yes, I think it makes sense to add |
Okay I'll transfer my work to ndarray and open a PR there 👍 |
@jturner314 I was looking at ndarray PRs and saw this one rust-ndarray/ndarray#513 it seems that it would probably be better to get that PR finished and merged in than start a new one that's less generic. Although I can continue with one since that one seems to have stalled |
I think it makes sense to file your PR in any case @xd009642 - when rust-ndarray/ndarray#513 lands/resumes activity we will take care of rephrasing |
This is equivalent to the cumsum for numpys histogram object. I've also added a basic test and a doc comment.