Skip flaky tests and open a critical issue to fix them once identified. #1824

sarayourfriend · 2023-04-18T22:49:38Z

sarayourfriend
Apr 18, 2023
Collaborator

Flaky Playwright tests are not just a nuisance in PRs, where they often fail repeatedly. They also cause issues with continuous deployments. For example, this workflow run on main failed to deploy the staging frontend because the Playwright tests failed: https://github.com/WordPress/openverse/actions/runs/4735389053/jobs/8405588480

This means that the staging frontend sometimes, arbitrarily, does not get deployed after a PR is merged. That means people are unable to test their changes in staging, which makes production deployments riskier as the changes haven't been verified in a live environment.

It also causes a new, avoidable alert for the MSR to look into. This increases the tediousness and unpleasantness of MSR by making alerts even noisier.

I'd like to propose a new process for handling flaky tests of any kind, Playwright, Python, or otherwise¹. When a flaky test is identified:

Create a critical issue linking to the failed test run and naming the flaky test. Create an issue per test.
Open a PR that adds a skip directive on the test with a comment in the code linking to the issue meant to resolve the flaky test.

Playwright: https://playwright.dev/docs/api/class-test#test-skip-1
Jest: https://jestjs.io/docs/api#testskipname-fn
pytest: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-skip

Why should the issue be critical? Assuming every test we write tests something of value, they should always be effective at testing that. If we are removing a test only because it does not consistently test what we want it to test, that means we've identified a gap in our testing. Until that gap is filled, we cannot be confident that our tests sufficiently cover the application. To reiterate: that is working under the assumption that every test we write is useful and tests something that must work in a particular way. If that is not the case for a particular flaky test, then we should remove it altogether and replace it with tests that do actually test something useful, if indeed there is anything useful to test.

I believe we may have discussed something along these lines in the past but did not actually follow through on implementing it. Almost every large project I've worked on in the past had some version of this policy. ↩

obulat · 2023-04-19T14:53:41Z

obulat
Apr 19, 2023
Maintainer

Thank you for opening this discussion, @sarayourfriend ! I like the process you outline, and setting the issue priority to critical seems appropriate.

With the visual regression tests, there can be some flakiness that is difficult to identify and/or fix. One of such cases is when the license icons on the single result page are sometimes rendered higher (or lower) than expected. I really hope that replacing the separate SVG files with a sprite might help fix some flakiness caused by the items not loading correctly.

I would like to note that these issues might require attention from several members of the team to figure out the solution.

2 replies

sarayourfriend Apr 20, 2023
Collaborator Author

Totally: hopefully having critical issues would also force us to spend time to find workarounds. We can always deprioritise issues if we think the solution for fixing the root problem is too complex to do now as well.

obulat Apr 26, 2023
Maintainer

@sarayourfriend, I think we should start using this process after #1869 is merged. A couple of other PRs have been merged recently to reduce the flakiness, so I think we should be able to treat flaky tests as critical.

Just a note about one possible solution for visual regression tests: we can pass { maxDiffPixelRatio: 0.02 }, as the first parameter to breakpoints.describe if we think that a slight change in the snapshots should be tolerated for specific test. Or if we think that the test is valuable even if a part of the snapshot is flaky.

krysal · 2023-04-26T20:40:00Z

krysal
Apr 26, 2023
Maintainer

Now I understand the reason to marking them like critical and makes sense to me. It's certainly a problem if tests are blocking deployments when they shouldn't. I agree with the proposal.

Pinging @zackkrida and @dhruvkb as they should be aware of this conversation as well.

1 reply

dhruvkb Apr 29, 2023
Maintainer

Noted, and agree with the approach outlined by @sarayourfriend. I support the idea of skipping the flaky test for immediately unblocking current work, and then fixing the flaky tests with critical priority.

zackkrida · 2023-05-01T17:01:17Z

zackkrida
May 1, 2023
Collaborator

Sounds like there's consensus on the excellent proposed workflow from @sarayourfriend. In light of that, we need to update documentation with the following:

The new process for failing tests:

Create a critical issue linking to the failed test run and naming the flaky test. Create an issue per test.
Open a PR that adds a skip directive on the test with a comment in the code linking to the issue meant to resolve the flaky test.
Playwright: playwright.dev/docs/api/class-test#test-skip-1
Jest: jestjs.io/docs/api#testskipname-fn
pytest: docs.pytest.org/en/7.1.x/reference/reference.html#pytest-skip

Add this note from @obulat

we can pass { maxDiffPixelRatio: 0.02 }, as the first parameter to breakpoints.describe if we think that a slight change in the snapshots should be tolerated for specific test.

@sarayourfriend is this something you would be able to do?

1 reply

sarayourfriend May 1, 2023
Collaborator Author

@obulat's already on the maxDiffPixelRatio: #1960.

I don't know the frontend test suite at all. If there are other tests that should be added to then we can do it.

I'll write an issue to update the documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip flaky tests and open a critical issue to fix them once identified. #1824

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Skip flaky tests and open a critical issue to fix them once identified. #1824

sarayourfriend Apr 18, 2023 Collaborator

Footnotes

Replies: 3 comments · 4 replies

obulat Apr 19, 2023 Maintainer

sarayourfriend Apr 20, 2023 Collaborator Author

obulat Apr 26, 2023 Maintainer

krysal Apr 26, 2023 Maintainer

dhruvkb Apr 29, 2023 Maintainer

zackkrida May 1, 2023 Collaborator

sarayourfriend May 1, 2023 Collaborator Author

sarayourfriend
Apr 18, 2023
Collaborator

Replies: 3 comments 4 replies

obulat
Apr 19, 2023
Maintainer

sarayourfriend Apr 20, 2023
Collaborator Author

obulat Apr 26, 2023
Maintainer

krysal
Apr 26, 2023
Maintainer

dhruvkb Apr 29, 2023
Maintainer

zackkrida
May 1, 2023
Collaborator

sarayourfriend May 1, 2023
Collaborator Author