Skip to content

Central moments #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Apr 6, 2019
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
62e8b02
Added definition of central order moment
LukeMathWalker Jan 15, 2019
89d6b8c
Added stubbed implementation
LukeMathWalker Jan 15, 2019
9a5b565
Add behaviour to return None if the array is empty
LukeMathWalker Jan 15, 2019
d9826e5
Handle edge cases
LukeMathWalker Jan 15, 2019
316a0e5
Added test
LukeMathWalker Jan 15, 2019
00782d8
First implementation completed
LukeMathWalker Jan 15, 2019
1d634ac
Added test for second order moment (variance)
LukeMathWalker Jan 15, 2019
05bf683
Implemented test and renamed to central_moment
LukeMathWalker Jan 15, 2019
5921dd2
Using Horner's method for evaluating polynomials
LukeMathWalker Jan 15, 2019
c04e295
Add algorithm notes to the docs
LukeMathWalker Jan 15, 2019
9cddde5
Added algorithmic complexity to the docs
LukeMathWalker Jan 15, 2019
336a51a
Added two optimizations
LukeMathWalker Jan 15, 2019
7f012af
Added signature for bulk central moment computation
LukeMathWalker Jan 18, 2019
cfd4dbe
Fixed trait signature
LukeMathWalker Jan 18, 2019
2517b4d
Indent
LukeMathWalker Jan 18, 2019
b5a520c
Factored out logic from central_moment
LukeMathWalker Jan 18, 2019
7eb268f
Implemented central_moments
LukeMathWalker Jan 18, 2019
2e61f2e
Test implementation correctness
LukeMathWalker Jan 18, 2019
9d37f15
Added skewness and kurtosis
LukeMathWalker Jan 18, 2019
4829ca6
Added docs
LukeMathWalker Jan 18, 2019
c748c8e
Added tests for kurtosis and skewness
LukeMathWalker Jan 18, 2019
4bf0035
Fixed assertions
LukeMathWalker Jan 22, 2019
da49a28
No need to get to order 4 for skewness
LukeMathWalker Jan 22, 2019
4a193bb
Fixed kurtosis test
LukeMathWalker Jan 23, 2019
5fe1fe8
Enriched docs
LukeMathWalker Jan 23, 2019
36428c7
Fmt
LukeMathWalker Mar 26, 2019
cff93fe
Merge master
LukeMathWalker Mar 26, 2019
25f3f0f
Avoid computing mean for p=0,1 central moments
jturner314 Mar 31, 2019
d9ba4bb
Replace map + clone with mapv
jturner314 Mar 31, 2019
4a05288
Check for moment order overflowing `i32`
jturner314 Mar 31, 2019
3aaab13
Take advantage of IterBinomial from num-integer
jturner314 Mar 31, 2019
7efba82
Remove unnecessary explicit clones
jturner314 Mar 31, 2019
0b0b0ba
Replace .map() with ?
jturner314 Mar 31, 2019
f992453
Test more cases in central moment tests
jturner314 Mar 31, 2019
a8af880
Rename central moment tests
jturner314 Mar 31, 2019
a701f33
Push diff to the lowest possible exponent
LukeMathWalker Apr 1, 2019
94f0950
Merge pull request #3 from jturner314/central-moments
LukeMathWalker Apr 1, 2019
595af7b
Merge branch 'master' into central-moments
LukeMathWalker Apr 2, 2019
1796a48
Return a Result instead of Option, for consistency
LukeMathWalker Apr 2, 2019
e21ca2b
Match impl with trait definition
LukeMathWalker Apr 2, 2019
8762490
Match test to trait+impl
LukeMathWalker Apr 2, 2019
76b0077
Fmt
LukeMathWalker Apr 2, 2019
74cc782
Converted order to u16
LukeMathWalker Apr 5, 2019
31a3909
Fmt
LukeMathWalker Apr 5, 2019
1f9bcf1
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
fb65893
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
c3a166e
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
280 changes: 273 additions & 7 deletions src/summary_statistics/means.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,182 @@ where
{
self.map(|x| x.ln()).mean().map(|x| x.exp())
}

fn kurtosis(&self) -> Option<A>
where
A: Float + FromPrimitive,
{
let central_moments = self.central_moments(4);
central_moments.map(|moments| moments[4] / moments[2].powi(2))
}

fn skewness(&self) -> Option<A>
where
A: Float + FromPrimitive,
{
let central_moments = self.central_moments(3);
central_moments.map(|moments| moments[3] / moments[2].sqrt().powi(3))
}

fn central_moment(&self, order: usize) -> Option<A>
where
A: Float + FromPrimitive,
{
let mean = self.mean();
match mean {
None => None,
Some(mean) => match order {
0 => Some(A::one()),
1 => Some(A::zero()),
n => {
let shifted_array = self.map(|x| x.clone() - mean);
let shifted_moments = moments(shifted_array, n);
let correction_term = -shifted_moments[1].clone();

let coefficients = central_moment_coefficients(&shifted_moments);
Some(horner_method(coefficients, correction_term))
}
},
}
}

fn central_moments(&self, order: usize) -> Option<Vec<A>>
where
A: Float + FromPrimitive,
{
let mean = self.mean();
match mean {
None => None,
Some(mean) => {
match order {
0 => Some(vec![A::one()]),
1 => Some(vec![A::one(), A::zero()]),
n => {
// We only perform this operations once, and then reuse their
// result to compute all the required moments
let shifted_array = self.map(|x| x.clone() - mean);
let shifted_moments = moments(shifted_array, n);
let correction_term = -shifted_moments[1].clone();

let mut central_moments = vec![A::one(), A::zero()];
for k in 2..=n {
let coefficients = central_moment_coefficients(&shifted_moments[..=k]);
let central_moment = horner_method(coefficients, correction_term);
central_moments.push(central_moment)
}
Some(central_moments)
}
}
}
}
}
}

/// Returns a vector containing all moments of the array elements up to
/// *order*, where the *p*-th moment is defined as:
///
/// ```text
/// 1 n
/// ― ∑ xᵢᵖ
/// n i=1
/// ```
///
/// The returned moments are ordered by power magnitude: 0th moment, 1st moment, etc.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
fn moments<A, S, D>(a: ArrayBase<S, D>, order: usize) -> Vec<A>
where
A: Float + FromPrimitive,
S: Data<Elem = A>,
D: Dimension,
{
let n_elements =
A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail");

// When k=0, we are raising each element to the 0th power
// No need to waste CPU cycles going through the array
let mut moments = vec![A::one()];

if order >= 1 {
// When k=1, we don't need to raise elements to the 1th power (identity)
moments.push(a.sum() / n_elements)
}

for k in 2..=order {
moments.push(a.map(|x| x.powi(k as i32)).sum() / n_elements)
}
moments
}

/// Returns the coefficients in the polynomial expression to compute the *p*th
/// central moment as a function of the sample mean.
///
/// It takes as input all moments up to order *p*, ordered by power magnitude - *p* is
/// inferred to be the length of the *moments* array.
fn central_moment_coefficients<A>(moments: &[A]) -> Vec<A>
where
A: Float + FromPrimitive,
{
let order = moments.len();
moments
.iter()
.rev()
.enumerate()
.map(|(k, moment)| A::from_usize(binomial_coefficient(order, k)).unwrap() * *moment)
.collect()
}

/// Returns the binomial coefficient "n over k".
///
/// **Panics** if k > n.
fn binomial_coefficient(n: usize, k: usize) -> usize {
if k > n {
panic!(
"Tried to compute the binomial coefficient of {0} over {1}, \
but {1} is strictly greater than {0}!"
)
}
// BC(n, k) = BC(n, n-k)
let k = if k > n - k { n - k } else { k };
let mut result = 1;
for i in 0..k {
result = result * (n - i);
result = result / (i + 1);
}
result
}

/// Uses [Horner's method] to evaluate a polynomial with a single indeterminate.
///
/// Coefficients are expected to be sorted by ascending order
/// with respect to the indeterminate's exponent.
///
/// If the array is empty, `A::zero()` is returned.
///
/// Horner's method can evaluate a polynomial of order *n* with a single indeterminate
/// using only *n-1* multiplications and *n-1* sums - in terms of number of operations,
/// this is an optimal algorithm for polynomial evaluation.
///
/// [Horner's method]: https://en.wikipedia.org/wiki/Horner%27s_method
fn horner_method<A>(coefficients: Vec<A>, indeterminate: A) -> A
where
A: Float,
{
let mut result = A::zero();
for coefficient in coefficients.into_iter().rev() {
result = coefficient + indeterminate * result
}
result
}

#[cfg(test)]
mod tests {
use super::SummaryStatisticsExt;
use approx::abs_diff_eq;
use ndarray::{array, Array1};
use approx::assert_abs_diff_eq;
use ndarray::{array, Array, Array1};
use ndarray_rand::RandomExt;
use noisy_float::types::N64;
use rand::distributions::Uniform;
use std::f64;

#[test]
Expand Down Expand Up @@ -88,16 +256,114 @@ mod tests {
// Computed using SciPy
let expected_geometric_mean = 0.4345897639796527;

abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON);
abs_diff_eq!(
assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This epsilon is fairly large. Is it due to ndarray not having pairwise summation yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.
It seems our mean estimate agrees with NumPy's with a delta of 1e-9 - I'd say that the lack of pairwise summation is a big suspect. It would be nice to have something like decimal in Rust to get a sense of the real error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.

Okay, thanks. That's helpful to get a better understanding of the precision.

It would be nice to have something like decimal in Rust to get a sense of the real error.

For this particular case, decimal gives the following for the standard and harmonic mean:

>>> from decimal import *
>>> getcontext().prec = 100
>>> data = [
...     Decimal('0.99889651'), Decimal('0.0150731'), Decimal('0.28492482'), Decimal('0.83819218'), Decimal('0.48413156'), Decimal('0.80710412'), Decimal('0.41762936'),
...     Decimal('0.22879429'), Decimal('0.43997224'), Decimal('0.23831807'), Decimal('0.02416466'), Decimal('0.6269962'), Decimal('0.47420614'), Decimal('0.56275487'),
...     Decimal('0.78995021'), Decimal('0.16060581'), Decimal('0.64635041'), Decimal('0.34876609'), Decimal('0.78543249'), Decimal('0.19938356'), Decimal('0.34429457'),
...     Decimal('0.88072369'), Decimal('0.17638164'), Decimal('0.60819363'), Decimal('0.250392'), Decimal('0.69912532'), Decimal('0.78855523'), Decimal('0.79140914'),
...     Decimal('0.85084218'), Decimal('0.31839879'), Decimal('0.63381769'), Decimal('0.22421048'), Decimal('0.70760302'), Decimal('0.99216018'), Decimal('0.80199153'),
...     Decimal('0.19239188'), Decimal('0.61356023'), Decimal('0.31505352'), Decimal('0.06120481'), Decimal('0.66417377'), Decimal('0.63608897'), Decimal('0.84959691'),
...     Decimal('0.43599069'), Decimal('0.77867775'), Decimal('0.88267754'), Decimal('0.83003623'), Decimal('0.67016118'), Decimal('0.67547638'), Decimal('0.65220036'),
...     Decimal('0.68043427')
... ]
>>> sum(data) / len(data)
Decimal('0.5475494054')
>>> 1 / (sum(1/x for x in data) / len(data))
Decimal('0.2179009364951495638060724591285330338229747648318613758290505954023583698639515519076072129091188061')

In Rust, we could add a dev-dependency on rug to compute the correct result to the necessary precision, but it's not necessary for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reassuring: using the results from decimal it turns out our computation is correct up to f64::EPSILON in this specific case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely look into using something like rug to keep in check our numerical error. On the side, comparing with SciPy/NumPy reassures me that the algorithm implementation is roughly correct, even if due to the respective numerical error we do not agree on the smaller digits.

assert_abs_diff_eq!(
a.harmonic_mean().unwrap(),
expected_harmonic_mean,
epsilon = f64::EPSILON
epsilon = 1e-6
);
abs_diff_eq!(
assert_abs_diff_eq!(
a.geometric_mean().unwrap(),
expected_geometric_mean,
epsilon = f64::EPSILON
epsilon = 1e-6
);
}

#[test]
fn test_central_order_moment_with_empty_array_of_floats() {
let a: Array1<f64> = array![];
assert!(a.central_moment(1).is_none());
assert!(a.central_moments(1).is_none());
}

#[test]
fn test_zeroth_central_order_moment_is_one() {
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
assert_eq!(a.central_moment(0).unwrap(), 1.);
}

#[test]
fn test_first_central_order_moment_is_zero() {
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
assert_eq!(a.central_moment(1).unwrap(), 0.);
}

#[test]
fn test_central_order_moments() {
let a: Array1<f64> = array![
0.07820559, 0.5026185, 0.80935324, 0.39384033, 0.9483038, 0.62516215, 0.90772261,
0.87329831, 0.60267392, 0.2960298, 0.02810356, 0.31911966, 0.86705506, 0.96884832,
0.2222465, 0.42162446, 0.99909868, 0.47619762, 0.91696979, 0.9972741, 0.09891734,
0.76934818, 0.77566862, 0.7692585, 0.2235759, 0.44821286, 0.79732186, 0.04804275,
0.87863238, 0.1111003, 0.6653943, 0.44386445, 0.2133176, 0.39397086, 0.4374617,
0.95896624, 0.57850146, 0.29301706, 0.02329879, 0.2123203, 0.62005503, 0.996492,
0.5342986, 0.97822099, 0.5028445, 0.6693834, 0.14256682, 0.52724704, 0.73482372,
0.1809703,
];
// Computed using scipy.stats.moment
let expected_moments = vec![
1.,
0.,
0.09339920262960291,
-0.0026849636727735186,
0.015403769257729755,
-0.001204176487006564,
0.002976822584939186,
];
for (order, expected_moment) in expected_moments.iter().enumerate() {
assert_abs_diff_eq!(
a.central_moment(order).unwrap(),
expected_moment,
epsilon = 1e-6
);
}
}

#[test]
fn test_bulk_central_order_moments() {
// Test that the bulk method is coherent with the non-bulk method
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
let order = 10;
let central_moments = a.central_moments(order).unwrap();
for i in 0..=order {
assert_eq!(a.central_moment(i).unwrap(), central_moments[i]);
}
}

#[test]
fn test_kurtosis_and_skewness_is_none_with_empty_array_of_floats() {
let a: Array1<f64> = array![];
assert!(a.skewness().is_none());
assert!(a.kurtosis().is_none());
}

#[test]
fn test_kurtosis_and_skewness() {
let a: Array1<f64> = array![
0.33310096, 0.98757449, 0.9789796, 0.96738114, 0.43545674, 0.06746873, 0.23706562,
0.04241815, 0.38961714, 0.52421271, 0.93430327, 0.33911604, 0.05112372, 0.5013455,
0.05291507, 0.62511183, 0.20749633, 0.22132433, 0.14734804, 0.51960608, 0.00449208,
0.4093339, 0.2237519, 0.28070469, 0.7887231, 0.92224523, 0.43454188, 0.18335111,
0.08646856, 0.87979847, 0.25483457, 0.99975627, 0.52712442, 0.41163279, 0.85162594,
0.52618733, 0.75815023, 0.30640695, 0.14205781, 0.59695813, 0.851331, 0.39524328,
0.73965373, 0.4007615, 0.02133069, 0.92899207, 0.79878191, 0.38947334, 0.22042183,
0.77768353,
];
// Computed using scipy.stats.kurtosis(a, fisher=False)
let expected_kurtosis = 1.821933711687523;
// Computed using scipy.stats.skew
let expected_skewness = 0.2604785422878771;

let kurtosis = a.kurtosis().unwrap();
let skewness = a.skewness().unwrap();

assert_abs_diff_eq!(kurtosis, expected_kurtosis, epsilon = 1e-6);
assert_abs_diff_eq!(skewness, expected_skewness, epsilon = 1e-6);
}
}
77 changes: 77 additions & 0 deletions src/summary_statistics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,83 @@ where
fn geometric_mean(&self) -> Option<A>
where
A: Float + FromPrimitive;

/// Returns the [kurtosis] `Kurt[X]` of all elements in the array:
///
/// ```text
/// Kurt[X] = μ₄ / σ⁴
/// ```
///
/// where μ₄ is the fourth central moment and σ is the standard deviation of
/// the elements in the array.
///
/// This is sometimes referred to as _Pearson's kurtosis_. Fisher's kurtosis can be
/// computed by subtracting 3 from Pearson's kurtosis.
///
/// If the array is empty, `None` is returned.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
///
/// [kurtosis]: https://en.wikipedia.org/wiki/Kurtosis
fn kurtosis(&self) -> Option<A>
where
A: Float + FromPrimitive;

/// Returns the [Pearson's moment coefficient of skewness] γ₁ of all elements in the array:
///
/// ```text
/// γ₁ = μ₃ / σ³
/// ```
///
/// where μ₃ is the third central moment and σ is the standard deviation of
/// the elements in the array.
///
/// If the array is empty, `None` is returned.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
///
/// [Pearson's moment coefficient of skewness]: https://en.wikipedia.org/wiki/Skewness
fn skewness(&self) -> Option<A>
where
A: Float + FromPrimitive;

/// Returns the *p*-th [central moment] of all elements in the array, μₚ:
///
/// ```text
/// 1 n
/// μₚ = ― ∑ (xᵢ-x̅)ᵖ
/// n i=1
/// ```
///
/// If the array is empty, `None` is returned.
///
/// The *p*-th central moment is computed using a corrected two-pass algorithm (see Section 3.5
/// in [Pébay et al., 2016]). Complexity is *O(np)* when *n >> p*, *p > 1*.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
///
/// [central moment]: https://en.wikipedia.org/wiki/Central_moment
/// [Pébay et al., 2016]: https://www.osti.gov/pages/servlets/purl/1427275
fn central_moment(&self, order: usize) -> Option<A>
where
A: Float + FromPrimitive;

/// Returns the first *p* [central moments] of all elements in the array, see [central moment]
/// for more details.
///
/// If the array is empty, `None` is returned.
///
/// This method reuses the intermediate steps for the *k*-th moment to compute the *(k+1)*-th,
/// being thus more efficient than repeated calls to [central moment] if the computation
/// of central moments of multiple orders is required.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
///
/// [central moments]: https://en.wikipedia.org/wiki/Central_moment
/// [central moment]: #tymethod.central_moment
fn central_moments(&self, order: usize) -> Option<Vec<A>>
where
A: Float + FromPrimitive;
}

mod means;