diff --git a/docs/comparison-analysis.md b/docs/comparison-analysis.md index bd1b69239..45a5e45d0 100644 --- a/docs/comparison-analysis.md +++ b/docs/comparison-analysis.md @@ -68,7 +68,3 @@ The actual algorithm for determining relevance of a comparison summary may chang * High relevance: any number of very large or large changes, a small amount of medium changes, or a large number of small or very small changes. * Medium relevance: any number of very large or large changes, any medium change, or smaller but still substantial number of small or very small changes. * Low relevance: if it doesn't fit into the above two categories, it ends in this category. - -### "Dodgy" Test Cases - -"Dodgy" test cases are test cases that tend to produce unreliable results (i.e., noise). A test case is considered "dodgy" if its significance threshold is sufficiently far enough away from 0. diff --git a/docs/glossary.md b/docs/glossary.md index 7eed20869..fd430b460 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -38,7 +38,6 @@ The following is a glossary of domain specific terminology. Although benchmarks * **significant test result comparison**: a test result comparison above the significance threshold. Significant test result comparisons can be thought of as being "statistically significant". * **relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's significance and magnitude. Comparisons are considered relevant if they are significant and have at least a small magnitude . * **test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold). -* **dodgy test case**: a test case for which the significance threshold is significantly large indicating a high amount of variability in the test and thus making it necessary to be somewhat skeptical of any results too close to the significance threshold. ## Other diff --git a/site/src/api.rs b/site/src/api.rs index e5788dc4c..e1c343364 100644 --- a/site/src/api.rs +++ b/site/src/api.rs @@ -191,7 +191,6 @@ pub mod comparison { pub scenario: String, pub is_significant: bool, pub significance_factor: Option, - pub is_dodgy: bool, pub magnitude: String, pub statistics: (f64, f64), } diff --git a/site/src/comparison.rs b/site/src/comparison.rs index a40050bd3..5b32f3dc1 100644 --- a/site/src/comparison.rs +++ b/site/src/comparison.rs @@ -117,7 +117,6 @@ pub async fn handle_compare( benchmark: comparison.benchmark.to_string(), profile: comparison.profile.to_string(), scenario: comparison.scenario.to_string(), - is_dodgy: comparison.is_dodgy(), is_significant: comparison.is_significant(), significance_factor: comparison.significance_factor(), magnitude: comparison.magnitude().display().to_owned(), @@ -953,13 +952,6 @@ impl HistoricalData { .windows(2) .map(|window| (window[0] - window[1]).abs()) } - - /// Whether we can trust this benchmark or not - fn is_dodgy(&self) -> bool { - // If changes are judged significant only exceeding 0.2%, then the - // benchmark as a whole is dodgy. - self.significance_threshold() * 100.0 > 0.2 - } } /// Gets the previous commit @@ -1096,13 +1088,6 @@ impl TestResultComparison { from_u8((as_u8(over_threshold) + as_u8(absolute_magnitude)) / 2) } - fn is_dodgy(&self) -> bool { - self.historical_data - .as_ref() - .map(|v| v.is_dodgy()) - .unwrap_or(false) - } - fn relative_change(&self) -> f64 { let (a, b) = self.results; (b - a) / a diff --git a/site/static/compare.html b/site/static/compare.html index 5a74d9be5..6540a58eb 100644 --- a/site/static/compare.html +++ b/site/static/compare.html @@ -810,7 +810,6 @@

Comparing {{stat}} between { magnitude: c.magnitude, isSignificant: c.is_significant, significanceFactor: c.significance_factor, - isDodgy: c.is_dodgy, datumA, datumB, percent, @@ -1049,7 +1048,7 @@

Comparing {{stat}} between { - {{ testCase.percent.toFixed(2) }}%{{testCase.isDodgy ? "?" : ""}} + {{ testCase.percent.toFixed(2) }}%