-
Notifications
You must be signed in to change notification settings - Fork 28
Central moments #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Central moments #23
Conversation
Once pairwise summation gets merged in ndarray, this will actually be a corrected pairwise two-pass algorithm for central moments. |
Added another method, Useful to get an efficient kurtosis and skewness computation, given that their formulas involve central moments of different order. |
Added kurtosis and skewness computation. |
Fixed an issue in the kurtosis' test (I was comparing to SciPy's kurtosis with |
In reality, this probably isn't necessary because I'd expect someone to terminate their program when trying to calculate a moment of order greater than `i32::MAX` (because it would take so long), but it's nice to have an explicit check anyway.
`A: Float` requires `A: Copy`.
I think this is a bit easier to understand at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! It's especially nice that you found a numerically stable formula instead of using the naive calculation. I've created LukeMathWalker#3 with some minor changes and added a couple of comments. Everything else looks good.
src/summary_statistics/means.rs
Outdated
@@ -88,16 +256,114 @@ mod tests { | |||
// Computed using SciPy | |||
let expected_geometric_mean = 0.4345897639796527; | |||
|
|||
abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON); | |||
abs_diff_eq!( | |||
assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This epsilon
is fairly large. Is it due to ndarray
not having pairwise summation yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tuned all epsilons to be as close as possible to 1e-12
, stopping when the assertion fails.
It seems our mean estimate agrees with NumPy's with a delta of 1e-9
- I'd say that the lack of pairwise summation is a big suspect. It would be nice to have something like decimal in Rust to get a sense of the real error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tuned all epsilons to be as close as possible to
1e-12
, stopping when the assertion fails.
Okay, thanks. That's helpful to get a better understanding of the precision.
It would be nice to have something like decimal in Rust to get a sense of the real error.
For this particular case, decimal
gives the following for the standard and harmonic mean:
>>> from decimal import *
>>> getcontext().prec = 100
>>> data = [
... Decimal('0.99889651'), Decimal('0.0150731'), Decimal('0.28492482'), Decimal('0.83819218'), Decimal('0.48413156'), Decimal('0.80710412'), Decimal('0.41762936'),
... Decimal('0.22879429'), Decimal('0.43997224'), Decimal('0.23831807'), Decimal('0.02416466'), Decimal('0.6269962'), Decimal('0.47420614'), Decimal('0.56275487'),
... Decimal('0.78995021'), Decimal('0.16060581'), Decimal('0.64635041'), Decimal('0.34876609'), Decimal('0.78543249'), Decimal('0.19938356'), Decimal('0.34429457'),
... Decimal('0.88072369'), Decimal('0.17638164'), Decimal('0.60819363'), Decimal('0.250392'), Decimal('0.69912532'), Decimal('0.78855523'), Decimal('0.79140914'),
... Decimal('0.85084218'), Decimal('0.31839879'), Decimal('0.63381769'), Decimal('0.22421048'), Decimal('0.70760302'), Decimal('0.99216018'), Decimal('0.80199153'),
... Decimal('0.19239188'), Decimal('0.61356023'), Decimal('0.31505352'), Decimal('0.06120481'), Decimal('0.66417377'), Decimal('0.63608897'), Decimal('0.84959691'),
... Decimal('0.43599069'), Decimal('0.77867775'), Decimal('0.88267754'), Decimal('0.83003623'), Decimal('0.67016118'), Decimal('0.67547638'), Decimal('0.65220036'),
... Decimal('0.68043427')
... ]
>>> sum(data) / len(data)
Decimal('0.5475494054')
>>> 1 / (sum(1/x for x in data) / len(data))
Decimal('0.2179009364951495638060724591285330338229747648318613758290505954023583698639515519076072129091188061')
In Rust, we could add a dev-dependency on rug
to compute the correct result to the necessary precision, but it's not necessary for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reassuring: using the results from decimal
it turns out our computation is correct up to f64::EPSILON
in this specific case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely look into using something like rug
to keep in check our numerical error. On the side, comparing with SciPy/NumPy reassures me that the algorithm implementation is roughly correct, even if due to the respective numerical error we do not agree on the smaller digits.
Miscellaneous small improvements to central moments
I have changed the return type to match what we decided in #25 - it should be ready to be merged now @jturner314 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The only remaining potential change is making order
have type u32
or even u16
(since I can't imagine anyone wanting to calculate the 65537th order central moment). usize
is fine, though.
I agree - I changed it to |
src/summary_statistics/means.rs
Outdated
{ | ||
let n_elements = | ||
A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail"); | ||
let order = order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that order
is u16
, we can change this to let order = order as i32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it can't fail anymore. I'll merge once the CI has finished 👍
Computation of central moments of arbitrary order.
Would it be worth to add another method that takes the mean as a parameter to avoid re-computing it if the user already has its value @jturner314?