Skip flaky tests and open a critical issue to fix them once identified. #1824
Replies: 3 comments 4 replies
-
Thank you for opening this discussion, @sarayourfriend ! I like the process you outline, and setting the issue priority to critical seems appropriate. With the visual regression tests, there can be some flakiness that is difficult to identify and/or fix. One of such cases is when the license icons on the single result page are sometimes rendered higher (or lower) than expected. I really hope that replacing the separate SVG files with a sprite might help fix some flakiness caused by the items not loading correctly. I would like to note that these issues might require attention from several members of the team to figure out the solution. |
Beta Was this translation helpful? Give feedback.
-
Now I understand the reason to marking them like Pinging @zackkrida and @dhruvkb as they should be aware of this conversation as well. |
Beta Was this translation helpful? Give feedback.
-
Sounds like there's consensus on the excellent proposed workflow from @sarayourfriend. In light of that, we need to update documentation with the following:
@sarayourfriend is this something you would be able to do? |
Beta Was this translation helpful? Give feedback.
-
Flaky Playwright tests are not just a nuisance in PRs, where they often fail repeatedly. They also cause issues with continuous deployments. For example, this workflow run on
main
failed to deploy the staging frontend because the Playwright tests failed: https://github.com/WordPress/openverse/actions/runs/4735389053/jobs/8405588480This means that the staging frontend sometimes, arbitrarily, does not get deployed after a PR is merged. That means people are unable to test their changes in staging, which makes production deployments riskier as the changes haven't been verified in a live environment.
It also causes a new, avoidable alert for the MSR to look into. This increases the tediousness and unpleasantness of MSR by making alerts even noisier.
I'd like to propose a new process for handling flaky tests of any kind, Playwright, Python, or otherwise1. When a flaky test is identified:
Why should the issue be critical? Assuming every test we write tests something of value, they should always be effective at testing that. If we are removing a test only because it does not consistently test what we want it to test, that means we've identified a gap in our testing. Until that gap is filled, we cannot be confident that our tests sufficiently cover the application. To reiterate: that is working under the assumption that every test we write is useful and tests something that must work in a particular way. If that is not the case for a particular flaky test, then we should remove it altogether and replace it with tests that do actually test something useful, if indeed there is anything useful to test.
Footnotes
I believe we may have discussed something along these lines in the past but did not actually follow through on implementing it. Almost every large project I've worked on in the past had some version of this policy. ↩
Beta Was this translation helpful? Give feedback.
All reactions