-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dudect test with cropped time analysis #283
Conversation
This implementation seems quite different from the original one in dudect. In the original version, they maintain the time data after cropping in different percentile scales as other data sets (refer to original update_statistics()). For example, if there are 100 different crop scales, it will have 101 (100+1 raw data that didn't crop) arrays storing data under different scales, and extract the max t-value in these 101 arrays. |
Thank you for your feedback! You’re right that the previous implementation did not fully align with the original approach in dudect. Instead of maintaining separate data sets for different percentile scales, it weakened the influence of outliers rather than excluding them. I’ve now updated the code to correctly implement NUM_PERCENTILES + 1 independent tests, ensuring that each cropping scale maintains its own dataset. Please review the updated version, and let me know if you have any further suggestions. Thanks! |
Consider the changes below: --- a/dudect/fixture.c
+++ b/dudect/fixture.c
@@ -67,11 +67,11 @@ static int64_t percentile(const int64_t *a_sorted, double which, size_t size)
return a_sorted[pos];
}
-static int cmp(const int64_t *a, const int64_t *b)
+/* leverages the fact that comparison expressions return 1 or 0. */
+static int cmp(const void *aa, const void *bb)
{
- if (*a == *b)
- return 0;
- return (*a - *b) > 0 ? 1 : -1;
+ int64_t a = *(const int64_t *) aa, b = *(const int64_t *) bb;
+ return (a > b) - (a < b);
}
/* This function is used to set different thresholds for cropping measurements.
@@ -82,8 +82,7 @@ static int cmp(const int64_t *a, const int64_t *b)
*/
static void prepare_percentiles(int64_t *exec_times, int64_t *percentiles)
{
- qsort(exec_times, N_MEASURES, sizeof(int64_t),
- (int (*)(const void *, const void *)) cmp);
+ qsort(exec_times, N_MEASURES, sizeof(int64_t), cmp);
for (size_t i = 0; i < NUM_PERCENTILES; i++) {
percentiles[i] = percentile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that Change-Ids appear in the commit messages.
Your implementation is solid—it not only removes branching but also optimizes performance. I analyzed the compiled machine code from common C compilers, and in most cases, your approach results in a more efficient execution. I have added your implementation in commit ba691d0. Thanks for the review! |
I defer to @Andrushika for confirmation. |
The purpose of the branchless comparison function was not security mitigation against side-channel attacks, but rather improvement for computational efficiency. See Branchless Programming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few minor parts that could be improved. Please take a look at the suggestions.
dudect/fixture.c
Outdated
t_context_t *t = max_test(); | ||
double max_t = fabs(t_compute(t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the only use of *t
is to calculate max_t
, I recommend reducing the redundant code:
double max_t = max_test();
Also, remember to change the return value and types of max_test()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also be used to calculate number_traces_max_t
to determine whether the measurements is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, it was my fault, sorry for misunderstanding.
dudect/fixture.c
Outdated
t_context_t *t = max_test(); | ||
double max_t = fabs(t_compute(t)); | ||
double number_traces_max_t = t->n[0] + t->n[1]; | ||
double max_tau = max_t / sqrt(number_traces_max_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to calculate max_t
and max_tau
after checking whether number_traces_max_t < ENOUGH_MEASURE
, to avoid unnecessary calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can modify max_test
to return the number_traces_max_t
and pass double *max_t
as an argument to retrieve the max_t
values, preventing unnecessary recalculations.
However, this may slightly reduce the readibility. Do you think it's worth it?
/* This function returns the number of measurements of the test with the
* maximum t value. And sets max_t to the maximum t value. */
static double *max_test_size(double *max_t)
{
...
}
static bool report(void)
{
double *max_t;
double number_traces_max_t = max_test_size(*max_t);
...
}
@jserv I've reviewed the changes and everything looks good. It's ready for merge! |
320a2b9
to
7e95512
Compare
Sorry for the commits. Due to two previous commits where the commit hook failed to add the Change-Id properly, running make test results in errors for all test cases. To resolve this issue, I am recommitting them with the correct Change-Id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash git commits and refine commit messages.
760d361
to
0c8b186
Compare
This commit introduces several improvements to the dudect test, including cropped time analysis and performance optimizations. - Remove outliers caused by context switches, interrupts, or system activity using a percentile-based threshold. - Store measurements in multiple t-test contexts to track t-tests in different percentile thresholds. - Fix integer overflow and improve the efficiency in the 'cmp()' function by using a branch-free comparison '(a > b) - (a < b)'. - Optimize the calculation of 'max_t' and 'max_tau' by deferring computations until necessary, reducing unnecessary calculation when measurements are insufficient. Change-Id: Icb910a8b9d93305da8e478e7e79cd25891c9e72e
0c8b186
to
83077d6
Compare
I've finished squashing the commits. Let me know if anything needs further refinement. |
Instead of leaving messages, you can simply press the review button on GitHub next time. |
Thank @BennyWang1007 for contributing! |
The original dudect collect all execution times and perform t-tests, which may be affected by outliers. The outliers could be caused by context switches, interrupts, or other system activities. This patch introduces percentile-based cropping to remove outliers.
The patch adds a new function "prepare_percentiles()" to compute thresholds using an complementary exponential decay scale. The function is called before the test starts.
The patch modifies "update_statistics()" to perform t-tests on cropped execution times by filtering out the outliers.