-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce CI and Local Flakiness #2496
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2496 +/- ##
============================================
- Coverage 78.65% 76.63% -2.03%
Complexity 267 267
============================================
Files 110 136 +26
Lines 13376 17549 +4173
Branches 0 1020 +1020
============================================
+ Hits 10521 13448 +2927
- Misses 2855 3568 +713
- Partials 0 533 +533
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 26 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
ea093e8
to
eabcc94
Compare
BenchmarksBenchmark execution time: 2024-02-14 15:21:08 Comparing candidate commit 2254c3d in PR branch Found 2 performance improvements and 0 performance regressions! Performance is the same for 37 metrics, 51 unstable metrics. scenario:PHPRedisBench/benchRedisBaseline
scenario:PHPRedisBench/benchRedisOverhead
|
32bb0c9
to
8336f9f
Compare
6c6403d
to
ade048c
Compare
Regarding web tests:
Regarding randomized:
|
Converted back to draft while I try to understand the flakiness of the randomized tests |
f39a061
to
b4a397a
Compare
431aa28
to
67c1c89
Compare
Signed-off-by: Bob Weinand <[email protected]>
Notes: I think not doing commands doing I/O should prevent, or at the very least reduce, any variability that could be associated with disks R/W. What's more, we're not using AOF, so a bgRewriteAOF doesn't make sense. We fundamentally don't care about the commands executed, only the overhead associated with traceMethodNoArgs and traceMethodAsCommand
Signed-off-by: Bob Weinand <[email protected]>
This reverts commit 53d5e95.
…t_hook_panic`" This reverts commit e232021.
This reverts commit da61eea.
1c20447
to
b2367da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work and investigation around the individual CI runs :-)
Description
TL;DR: Flakiness is not meant to disappear with this PR (and it won't). Randomized tests are still flaky and were already the greatest source of flakiness. The
build
pipeline should be significantly improved, while thebuild_packages
will be to a lesser extent because of some residual, unresolved flakiness in the randomized tests. Introducing a retry mechanism when sending traces reduces local flakiness considerably.Note: Not all numbers can be determined for now, considering there were CI failures associated with unintentional memory leaks, syntax errors, or such. Knowing this, and including these outliers, the pipeline success rate is above 80% (compared to ~65% on master). It will reduce flakiness to an extent.
Let's have it run on the master branch to get some numbers for next week and identify actionable issues.
Sources of Improvements
The introduction of
DD_TRACE_AGENT_RETRIES
(default:= 0) reduced flakiness locally and, to a lesser extent, on the CI.curl_easy_perform
would sometimes timeoutcurl_easy_perform
doesn't fix the issueDD_TRACE_SHUTDOWN_TIMEOUT
is greatly increased so that it doesn't conflict withDD_TRACE_BGS_TIMEOUT_VAL
integration-test_integrations_phpredis5-8.[0|1|2|3]-nts
randomized_tests...
xlarge
resource class, which could lead to a smaller relative overload of the tested instance and, therefore, fewer instances of missing pointsWaiting for elasticsearch
. This probably happens because of the race condition explained above (see error)integration-test_integrations-X.X
context deadline exceeded
Unresolved
randomized_tests...
Reviewer checklist