Skip to content

Remove the concept of dodginess #1275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions docs/comparison-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,3 @@ The actual algorithm for determining relevance of a comparison summary may chang
* High relevance: any number of very large or large changes, a small amount of medium changes, or a large number of small or very small changes.
* Medium relevance: any number of very large or large changes, any medium change, or smaller but still substantial number of small or very small changes.
* Low relevance: if it doesn't fit into the above two categories, it ends in this category.

### "Dodgy" Test Cases

"Dodgy" test cases are test cases that tend to produce unreliable results (i.e., noise). A test case is considered "dodgy" if its significance threshold is sufficiently far enough away from 0.
1 change: 0 additions & 1 deletion docs/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ The following is a glossary of domain specific terminology. Although benchmarks
* **significant test result comparison**: a test result comparison above the significance threshold. Significant test result comparisons can be thought of as being "statistically significant".
* **relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's significance and magnitude. Comparisons are considered relevant if they are significant and have at least a small magnitude .
* **test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold).
* **dodgy test case**: a test case for which the significance threshold is significantly large indicating a high amount of variability in the test and thus making it necessary to be somewhat skeptical of any results too close to the significance threshold.

## Other

Expand Down
1 change: 0 additions & 1 deletion site/src/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,6 @@ pub mod comparison {
pub scenario: String,
pub is_significant: bool,
pub significance_factor: Option<f64>,
pub is_dodgy: bool,
pub magnitude: String,
pub statistics: (f64, f64),
}
Expand Down
15 changes: 0 additions & 15 deletions site/src/comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,6 @@ pub async fn handle_compare(
benchmark: comparison.benchmark.to_string(),
profile: comparison.profile.to_string(),
scenario: comparison.scenario.to_string(),
is_dodgy: comparison.is_dodgy(),
is_significant: comparison.is_significant(),
significance_factor: comparison.significance_factor(),
magnitude: comparison.magnitude().display().to_owned(),
Expand Down Expand Up @@ -953,13 +952,6 @@ impl HistoricalData {
.windows(2)
.map(|window| (window[0] - window[1]).abs())
}

/// Whether we can trust this benchmark or not
fn is_dodgy(&self) -> bool {
// If changes are judged significant only exceeding 0.2%, then the
// benchmark as a whole is dodgy.
self.significance_threshold() * 100.0 > 0.2
}
}

/// Gets the previous commit
Expand Down Expand Up @@ -1096,13 +1088,6 @@ impl TestResultComparison {
from_u8((as_u8(over_threshold) + as_u8(absolute_magnitude)) / 2)
}

fn is_dodgy(&self) -> bool {
self.historical_data
.as_ref()
.map(|v| v.is_dodgy())
.unwrap_or(false)
}

fn relative_change(&self) -> f64 {
let (a, b) = self.results;
(b - a) / a
Expand Down
3 changes: 1 addition & 2 deletions site/static/compare.html
Original file line number Diff line number Diff line change
Expand Up @@ -810,7 +810,6 @@ <h2>Comparing <span id="stat-header">{{stat}}</span> between <span id="before">{
magnitude: c.magnitude,
isSignificant: c.is_significant,
significanceFactor: c.significance_factor,
isDodgy: c.is_dodgy,
datumA,
datumB,
percent,
Expand Down Expand Up @@ -1049,7 +1048,7 @@ <h2>Comparing <span id="stat-header">{{stat}}</span> between <span id="before">{
<td>
<a v-bind:href="percentLink(commitB, commitA, testCase)">
<span v-bind:class="percentClass(testCase.percent)">
{{ testCase.percent.toFixed(2) }}%{{testCase.isDodgy ? "?" : ""}}
{{ testCase.percent.toFixed(2) }}%
</span>
</a>
</td>
Expand Down