-
Notifications
You must be signed in to change notification settings - Fork 320
Implement scalar_min
and scalar_max
for A: Ord
#512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I haven't found a neat way yet to implement this both for |
Suggested api: /// return a minimum element from `self`, ignoring unorderable elements
/// (such as `f64::NAN`)
/// **Panics** if `self` is empty
pub fn scalar_min(&self) -> &A where A: PartialOrd;
/// return a minimum element from `self`, or `None` if `self` is empty or
/// contains unorderable elements.
pub fn scalar_partial_min(&self) -> Option<&A> where A: PartialOrd; This does not leverage the additional strictness of |
It would be nice to make the API naming convention consistent with what we are working on here! - rust-ndarray/ndarray-stats#1 |
I like the convention |
I've been thinking about the min/max problem for a while. Originally, I planned on providing three methods for each of min and max. After some more thought, though, I think just two would be better: /// Finds the elementwise minimum of the array.
///
/// Returns `None` if any of the pairwise orderings tested by the function
/// are undefined. (For example, this occurs if there are any
/// floating-point NaN values in the array.)
///
/// Additionally, returns `None` if the array is empty.
fn min(&self) -> Option<&A>
where
A: PartialOrd;
/// Finds the elementwise minimum of the array, skipping NaN values.
///
/// **Warning** This method will return a NaN value if none of the values
/// in the array are non-NaN values. Note that the NaN value might not be
/// in the array.
fn min_skipnan(&self) -> &A
where
A: MaybeNan,
A::NotNan: Ord; It is tempting to keep that third method /// Finds the elementwise minimum of the array.
///
/// **Panics** if the array is empty.
fn min(&self) -> &A
where
A: Ord; because it returns One thing that's always bothered me is that a lot of methods in
The reasoning behind this rule is that it tries to limit the distance in the code between the root cause of a panic and the place the panic actually occurs. In other words, when calling a method, the programmer generally has a good idea of the possible argument values because they are usually created near the call. In contrast, the array may have been created far away and undergone a complex series of operations before reaching this call. The mental overhead required to meet a constraint like "the length of this axis must be nonzero" or "the array must not be empty" can be fairly large. Of course, the programmer could manually add a check in their own code to avoid the panic, but in that case it would be more convenient for the array method to return a Additionally, I have more trouble remembering constraints like "the array must not be empty" unless I've recently read the docs, because they aren't directly related to the arguments passed into the method. By returning an AFAICT, all of the currently existing methods on Anyway, back on topic, that third method shown above violates the rule and IMO should be avoided. One note of interest:
It turns out that it's impossible to implement this with only the That's why for this type of method I defined a |
Regarding the last issue, a first valid element can be found by I also like your point on Options vs panic!s. If |
Loop unrolling alone works fine, but my attempts to do an early return upon finding a |
Good point! I didn't think about comparing values to themselves. Unfortunately, while that may work for use std::cmp::{Ordering, PartialEq, PartialOrd};
use std::f64::NAN;
pub struct MyF64(pub f64);
impl PartialEq for MyF64 {
// This implementation is symmetric and transitive, so it meets the
// requirements of `PartialEq`.
fn eq(&self, other: &MyF64) -> bool {
if self.0.is_nan() && other.0.is_nan() {
true
} else {
self.0.eq(&other.0)
}
}
}
impl PartialOrd for MyF64 {
// This implementation is antisymmetric and transitive, so it meets the
// requirements of `PartialOrd`.
fn partial_cmp(&self, other: &MyF64) -> Option<Ordering> {
if self.0.is_nan() && other.0.is_nan() {
Some(Ordering::Equal)
} else {
self.0.partial_cmp(&other.0)
}
}
}
fn main() {
let a = [MyF64(1.), MyF64(NAN), MyF64(NAN)];
for i in 0..3 {
for j in 0..3 {
println!(
"a[{}].partial_cmp(&a[{}]) == {:?}",
i, j, a[i].partial_cmp(&a[j]),
);
}
}
} The output of
So, given the result of all possible comparisons in this example, we can divide the array into two disjoint subsets, {
If That's how I decided to do it in my initial implementation for For the purpose of /// Finds the elementwise minimum of the array.
///
/// Returns `None` if any of the pairwise orderings tested by the function
/// are undefined. (For example, this occurs if there are any
/// floating-point NaN values in the array.)
///
/// Additionally, returns `None` if the array is empty.
fn min(&self) -> Option<&A>
where
A: PartialOrd,
{
self.fold(self.first(), |acc, elem| match elem.partial_cmp(acc?)? {
Ordering::Less => Some(elem),
_ => acc,
})
} (and a similar By the way, I remembered that a related discussion is #461. |
You raise a good point. The specification of I would argue for the following. It is both the most semantically correct, and arguably the most intuitive for users without a background in order theory:
It might be tempting in case 2 to search for the maximal value of the biggest totally ordered subset. There is however no way to determine if that is what the user wants and it will likely introduce a significant performance hit. If you agree, I wil try to put this in clear wording in the docstrings and write tests to verify this behavior. |
That proposal for
The behavior is only different when there are elements that are equal to themselves but have undefined ordering with respect to some other elements. The most plausible example I can think of for this case is an enum where the ordering is defined only between values of the same variant. For example, in my use std::cmp::{Ordering, PartialOrd};
#[derive(Clone, Debug, PartialEq)]
pub enum Value {
String(String),
Bytes(Vec<u8>),
Integer(num::BigInt),
Float(f64),
// more variants...
}
impl PartialOrd for Value {
fn partial_cmp(&self, rhs: &Value) -> Option<Ordering> {
use Value::*;
match (self, rhs) {
(String(l), String(r)) => l.partial_cmp(r),
(Bytes(l), Bytes(r)) => l.partial_cmp(r),
(Integer(l), Integer(r)) => l.partial_cmp(r),
(Float(l), Float(r)) => l.partial_cmp(r),
// more variants...
_ => None, // differing variants
}
}
} Then, I might want to find the maximum of an array of I'm having trouble coming up with a real-world case where I'd want behavior 2 instead. This reasoning makes me prefer option 1 because I think it's the least surprising and most conservative behavior. Does anyone have a real-world example where behavior 2 would be preferable? It's tempting to have For |
The problem with the old methods was that they panicked when the array was empty, which was very problematic. (See rust-ndarray/ndarray#512 for discussion.) The old `min_partialord` and `max_partialord` have been renamed to `min`/`max`.
…fix docs (#13) * Remove min/max for A: Ord The problem with the old methods was that they panicked when the array was empty, which was very problematic. (See rust-ndarray/ndarray#512 for discussion.) The old `min_partialord` and `max_partialord` have been renamed to `min`/`max`. * Document axis_len == 0 panic for quantile_axis_mut * Make quantile_mut return None for empty arrays * Fix panic docs for histogram strategies
This functionality is now provided by the |
Suggested by @jturner314:
I'd just do it in terms of
fold
with something like this (taking advantage of thefirst
method from PR #507):We don't need to manually unroll this because the compiler does a good job automatically (checked with Compiler Explorer using the
-O
compiler option).The desired behavior for floating-point types depends on the use-case because of NaN. One option is
which ignores NaN values. (It returns NaN only if there are no non-NaN values.) The compiler does a decent job automatically unrolling this, so we don't need to manually unroll in this case either.
Originally posted by @jturner314 in #505 (comment)
The text was updated successfully, but these errors were encountered: