feat: better support for visual regression testing #8161

aslushnikov · 2021-08-12T10:16:51Z

Playwright Test has a built-in toMatchSnapshot() method to power Visual Regression Testing (VRT).

However, VRT is still challenging due to variances in the host environments. There's a bunch of measures we can do right away to drastically improve experience in @playwright/test

support for docker test fixture to run browsers inside docker image.
support for blur in matching snapshot to counteract antialiasing
better UI for reviewing snapshot diffs

Interesting context:

migration from backstopjs to @playwright/test

The text was updated successfully, but these errors were encountered:

florianbepunkt · 2021-08-12T10:36:03Z

I think https://github.com/americanexpress/jest-image-snapshot provides a nice suite of options for various VRT scenarios. Test scenarios vary widely, depending on the context (testing components, whole pages, text-heavy or not, etc).

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible. Alternative image comparison algorithms could be left to userland, if they can be plugged into toMatchSnapshot via a common interface.

aslushnikov · 2021-08-12T14:24:36Z

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible.

@florianbepunkt What's SSM? Is it structural similarity measurement (SSIM)?

florianbepunkt · 2021-08-12T14:28:16Z

@aslushnikov Yes, typo.

kevinmpowell · 2021-08-12T14:51:23Z

Solid integration with Storybook would be beneficial for the work I do. Chromatic and Percy do this really well.

Also a UI for reviewing the diffs would be great.

aslushnikov · 2021-08-12T14:53:22Z

Also a UI for reviewing the diffs would be great.

@kevinmpowell What's the one that you find most handy? Is it a "slider" diff like here:

kevinmpowell · 2021-08-12T14:57:21Z

I actually prefer the pixel highlighting (like Playwright already does), but organize all the failing tests in a UI so I can see what failed without having to poke around three different images.

Also being able to A/B toggle the baseline and the test image is nice in some cases.

kevinmpowell · 2021-08-12T15:09:01Z

Slider is rarely useful for me. An onion-skin (transparency overlay) would be more useful.

AlexNetman · 2021-08-16T15:28:45Z

@aslushnikov Why toMatchSnapshot() is not available in the documentation?
It can not be found in API list.
And the article that was in 1.13 https://playwright.dev/python/docs/1.13.0/test-snapshots
is not available for 1.14 anymore.

Thanks for thinking about Visual Regression testing. Thats important!

florianbepunkt · 2021-08-16T19:11:19Z

On a related note: It would be great if tests could be run cross-plattform. Currently the os platform name is baked into the snapshot filename, so our CI tests sometime fail due to name miss-match. #7575

lo1tuma · 2021-08-18T15:12:38Z

support for blur in matching snapshot to counteract antialiasing

It would be nice if we could choose whether we want to apply such image filters before the snapshot is being saved or only when doing the comparison. I would prefer the first option as it keeps the diff small when creating new snapshots even of such images that change randomly / are flaky.

ts-23 · 2021-08-31T10:21:26Z

Please allow an auto-generated filename when toMatchSnapshot has no name input, similar to how toMatchSnapshot works in Jest.

Auto-gen filename when name not specified for toMatchSnapShot
Set default toMatchSnapshot file extension in playwright.config.ts

E.g.

// foo.spec.ts
toMatchSnapshot() => foo.spec.ts.snap (default extension customizable in playwright.config.ts)

When you have a lot of screenshot assertions in one file, we can avoid writing a lot of filename inputs:

sergioariveros · 2021-09-06T07:02:12Z

Thanks for thinking on this, blur feature is something that will help us, we have something similar before with puppeter that help us to do comparisson in animated pages, in addition to that something that can be really useful is be able to ignore specific parts of the screen, specially in those parts where we have more dynamic data(videos/images)

Doug-Bowen · 2021-11-08T21:06:01Z

Blur would help us greatly. Also, the slider view would be incredible as well.

z0n · 2021-11-17T10:07:36Z

We're also really interested in these improvements. We had to disable visual tests for now because they are randomly failing because a few pixels are off, even when increasing the threshold. Blur should help here hopefully.

damaon · 2021-12-06T12:08:27Z

I suggest solving biggest pain-point which is how to store this stuff in git repo so it doesn't blow up in size (to store only last snapshot). Git LFS kinda works but it's painful. Maybe something else would work better? For reference: americanexpress/jest-image-snapshot#92

Would be great if these snapshot dirs were automatically marked in git to only store last revision.

z0n · 2021-12-06T14:54:03Z

We're using Git LFS, what's your issue with it? Once we had it set up for everyone (we're using Mac, Windows and Linux), it worked fine. We're storing all images in the repo using Git LFS (*.png) so there's no work involved when adding snapshots to new tests either.

The only issue I have is comparing the image diffs in VS Code when committing new images as the old image is not shown in the diff view. The diff is working fine in the GitLab merge request view though so that's not a big issue.

aslushnikov · 2022-11-30T13:14:29Z

@gselsidi thank you for the sample!

gselsidi · 2022-11-30T19:27:27Z

I'll try to get some more as they come along, but i noticed the above occurs when taking snapshots of individual elements as opposed to the whole page. The whole page I'm able to use .0001 pixel ratio.

gselsidi · 2022-12-12T22:11:19Z

linking this here incase it applies:

#19417

nikicat · 2023-01-02T21:48:16Z

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it's used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots:

--font-render-hinting=none
--disable-skia-runtime-opts
--disable-font-subpixel-positioning
--disable-lcd-text

phungleson · 2023-02-06T00:36:21Z

We have similar issues with webkit on mac around emojis, I am not sure if we can provide further information to make debugging/fixing the issue easier?

It looks like mask is not available to configure at PlaywrightTestConfig level?

thekp · 2023-02-12T06:36:17Z

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it's used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots:
--font-render-hinting=none
--disable-skia-runtime-opts
--disable-font-subpixel-positioning
--disable-lcd-text

@nikicat Where exactly did you add these flags in your playwright project?

nikicat · 2023-02-12T11:34:31Z

@thekp I pass them to playwright.chromium.launch(args=[...]) here
(chromium_flags() function is overriden inside a test).

draganpazin · 2023-04-05T13:01:14Z

We test screenshots off add-in the web MS Office 365 Excel. In some cases, size of add-in is 1px bigger than original. It seems we cannot control it. MS Office decides for this and is not deterministc. Image diff is negligible, and we could ignore it, but since size of image do not match toMatchSnapshot fails.
Currently we do not have good workaround for that problem.

I would vote for toMatchSnapshot be able to compare images of different size.

ayroblu · 2023-04-06T05:56:30Z

One of the things we noticed is that our focus is somewhat different between runs (imagine loading a page with a text input, sometimes the text input is focused, sometimes it isn't).

Wondering if there's anything we can do to improve reliability apart from just manually blurring and focusing

matthias-ccri · 2023-04-25T18:43:55Z

My discrepancy was resolved by passing the --disable-remote-fonts flag to chromium.

    projects: [
        {
            name: 'chromium',
            use: {
                ...devices['Desktop Chrome'],
                launchOptions: {
                    args: [
                        // Configure text rendering so there's no difference between headless and headed (when debugging).
                        '--font-render-hinting=none',
                        '--disable-skia-runtime-opts',
                        '--disable-system-font-check',
                        '--disable-font-subpixel-positioning',
                        '--disable-lcd-text',
                        '--disable-remote-fonts',
                    ],
                },
            },
        },
    ],

GuilleDF · 2023-05-22T09:01:08Z

Hi, posting a flaky screenshot due to font rendering:

The baseline creation and the test run were both done on the mcr.microsoft.com/playwright:v1.28.0-focal docker image, on mobile safari (device preset is iPad (gen 7)).

Expected

Actual

Diff

mscottford · 2023-06-05T15:33:12Z

I wonder if some of the font rendering discrepancies might be because of local fonts being used instead of web fonts. For example, I have the Rambla font installed locally on my Mac, but my site also pulls that font in via CSS. In that case, consistently running the tests in an environment that does not have those fonts installed locally might address the problem. This might mean replacing the "expected" image with one from an environment that doesn't have the font installed. M. Scott Ford Co-Founder & Chief Code Whisperer (CTO) Corgibytes, LLC 804.596.2375 x701 pronouns: he/him ***@***.*** https://corgibytes.com ( https://corgibytes.com/ ) Have you read our First Round Review ( http://firstround.com/review/forget-technical-debt-heres-how-to-build-technical-wealth/ ) article about paying off technical debt? Love refactoring and TDD? Join us at LegacyCode.Rocks ( http://LegacyCode.Rocks ) for virtual meetups, podcasts, and more. Sent via Superhuman ( ***@***.*** )

…

On Mon, May 22, 2023 at 5:01 AM, Guillermo De Fermín < ***@***.*** > wrote: Hi, posting a flaky screenshot due to font rendering: The baseline creation and the test run were both done on the mcr. microsoft. com/ playwright:v1. 28. 0-focal ( http://mcr.microsoft.com/playwright:v1.28.0-focal ) docker image Expected ( https://user-images.githubusercontent.com/7784127/239858688-1c8b6ef5-e033-427e-abf0-a3ea02fa9746.png ) Actual ( https://user-images.githubusercontent.com/7784127/239858703-e7848cc1-986c-4ad8-98cf-d01df2c4ff78.png ) Diff ( https://user-images.githubusercontent.com/7784127/239858699-24bd8502-e069-4726-8a70-1dcc75be53a6.png ) — Reply to this email directly, view it on GitHub ( #8161 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAAFGXBJYEO3C4MRZJAIFXLXHMTN7ANCNFSM5CAZUKGQ ). You are receiving this because you are subscribed to this thread. Message ID: <microsoft/playwright/issues/8161/1556830269 @ github. com>

deviantintegral · 2023-06-21T21:20:25Z

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture:

And this one:

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

gselsidi · 2023-06-21T21:39:31Z

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture:

And this one:

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control.

At the end I just gave up at 100% pixel perfection and allowed a % of variance.

Even 95% accuracy is still a feat on its own, in reality you’ll probably get to 99.9% good enough.

but yeah would be cool to get 100% so we know if things degrade in the future we degrade from 100% as opposed to a starting point of 95%

also make sure you use docker if not always running against the actual same physical machine

deviantintegral · 2023-06-21T22:15:04Z

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control.
At the end I just gave up at 100% pixel perfection and allowed a % of variance.

nods yeah, that's what I figured. I'm currently working around it by setting maxDiffPixels if the browserName is webkit. Hopefully we can maintain 100% pixel coverage in Chrome or Firefox.

also make sure you use docker if not always running against the actual same physical machine

Good reminder. We're doing that with https://github.com/deviantintegral/ddev-playwright and the above screenshots are from running tests in a loop until they fail, all in the same environment.

pastelsky · 2023-07-27T05:00:01Z

There's a separate (related) issue regarding adding support for docker at #20954 so that visual regression tests can run in a consistent environment and environment-related differences are negated.

It would be helpful to receive upvotes there from folks here if that's something you need.

mfucci-medable · 2023-11-08T02:08:25Z

I am encountering the same issue with chromium (and webkit at an even higher frequency, too high so we disabled it).

Version: Playwright 1.38.1 (but the issue is reproducible as well in 1.39.0)
Env: running in ubuntu:jammy on an Apple M1 Pro (but the issue happens in our Linux CI pipeline as well, running in docker makes it pixel perfect between local and CI)
What happens: About 5% of the time, randomly one letter is incorrectly positionned, always the same letter. On other screenshots, it might 2 -3 letters, sometime in the middle of a word.
More info:

No network call, the css is inlined before the HTML.

Using chromium arguments (no improvements before / after enabling those arguments):

        '--font-render-hinting=none',
        '--disable-skia-runtime-opts',
        '--disable-system-font-check',
        '--disable-font-subpixel-positioning',
        '--disable-lcd-text',
        '--disable-remote-fonts',

My guess: this issue never happens on other screenshots that we are taking using exactly the same configuration, so it has to do with something in the HTML / CSS (that I am probably not allowed to share here)...

Actual / expected / diff (triggering here on the pseudo-locale test but might happen as well on the en-US version):

viktor-avd · 2024-01-26T14:40:37Z

From maintainers

Hey folks! if you have examples of PNG screenshots that are taken on the same browser and same OS yet are different due to anti-aliasing issues, could you please attach the "expected", "actual" and "diff" images here?

This information will help with our experiments with fighting browser rendering non-determinism.

Hi, this appeared with the latest version, nothing like this happened before with the same code and configuration.

Playwright version: 1.41.1
Docker image: mcr.microsoft.com/playwright:v1.41.1-jammy
Chrome without args / the same with next args:

'--disable-skia-runtime-opts',
'--disable-system-font-check',
'--disable-remote-fonts',
'--font-render-hinting=none',
'--disable-font-subpixel-positioning',

Example 1:

expected

actual

diff

Example 2:

expected

actual

diff

deviantintegral · 2024-01-27T22:26:00Z

An update from our experiences above: We found that increasing maxDiffPixels (or maxDiffPixelRatio) to a level that could avoid false failures also led to too many regressions slipping through visual comparisons. However, the threshold option as documented https://playwright.dev/docs/api/class-pageassertions#page-assertions-to-have-screenshot-2 worked for us. Once we increased that from the default 0.2 to 0.3, we've had no false failures or missed regressions.

ferrata · 2024-12-10T00:10:52Z

Hey folks! Any updates on this feature? 😅

I am facing the same behavior after switching to a recent Playwright version (1.49.0).

Diff

Actual

Expected

FWIW, it was working correctly on the Playwright version 1.41.2 with the PLAYWRIGHT_CHROMIUM_USE_HEADLESS_NEW variable on.

Thanks!

UPDATE:

Sorry, folks, I am an idiot 😀
It all works great with the new chromium channel as described in the 1.49 release note:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'], channel: 'chromium' },
    },
  ],
});

So it all works for me now!

Playwright needs better support: microsoft/playwright#28005 microsoft/playwright#29227 microsoft/playwright#8161 microsoft/playwright#18937 This reverts commit 62242bf.

aslushnikov mentioned this issue Aug 12, 2021

[Question]: React: Flaky screenshots (pixel shift) #7548

Closed

aslushnikov added P3-collecting-feedback v1.15 labels Aug 12, 2021

aslushnikov self-assigned this Aug 12, 2021

dgozman added v1.16 and removed v1.15 labels Sep 14, 2021

aslushnikov added v1.17 and removed v1.16 labels Oct 21, 2021

aslushnikov mentioned this issue Nov 2, 2021

[Feature] seamless docker integration #7439

Closed

5 tasks

mxschmitt added v1.18 and removed v1.17 labels Nov 4, 2021

mxschmitt mentioned this issue Nov 8, 2021

[Question] (Pixels different win/linux) Run Windows type Browser instance on Linux eg #10120

Closed

dgozman added feature-visual-regression-testing and removed P3-collecting-feedback labels Dec 6, 2021

mscottford mentioned this issue Mar 29, 2023

[Feature] For toHaveScreenshot report: align failure report file names with snapshot filename #22064

Open

aslushnikov mentioned this issue Apr 30, 2023

[BUG]Visual Regression Test screenshots differ from each pipeline run #22620

Closed

1 task

edumserrano mentioned this issue Jun 7, 2023

[BUG] Issues with VRT tests #23559

Closed

1 task

yury-s mentioned this issue Dec 20, 2023

[BUG] Error: Screenshot comparison failed #28741

Closed

This was referenced Mar 4, 2024

Integrate new flexbox classes into all pages GoogleChrome/webstatus.dev#68

Merged

Jcscottiii sidebar tests GoogleChrome/webstatus.dev#72

Closed

colinbowen mentioned this issue Apr 3, 2024

Ayr 883/visual regression viewports nationalarchives/da-ayr-beta-webapp#313

Merged

1 task

ljmotta mentioned this issue Apr 23, 2024

Containerize Playwright end-to-end tests to resolve screenshot comparison issues caused by OS differences apache/incubator-kie-issues#1114

Closed

scurker mentioned this issue Oct 10, 2024

chore: undo strict pixel matching dequelabs/cauldron#1714

Merged

feat: better support for visual regression testing #8161

feat: better support for visual regression testing #8161

Comments

aslushnikov commented Aug 12, 2021 • edited Loading

florianbepunkt commented Aug 12, 2021

aslushnikov commented Aug 12, 2021

florianbepunkt commented Aug 12, 2021

kevinmpowell commented Aug 12, 2021

aslushnikov commented Aug 12, 2021 • edited Loading

kevinmpowell commented Aug 12, 2021

kevinmpowell commented Aug 12, 2021

AlexNetman commented Aug 16, 2021

florianbepunkt commented Aug 16, 2021

lo1tuma commented Aug 18, 2021

ts-23 commented Aug 31, 2021

sergioariveros commented Sep 6, 2021

Doug-Bowen commented Nov 8, 2021

z0n commented Nov 17, 2021

damaon commented Dec 6, 2021 • edited Loading

z0n commented Dec 6, 2021 • edited Loading

aslushnikov commented Nov 30, 2022

gselsidi commented Nov 30, 2022

gselsidi commented Dec 12, 2022

nikicat commented Jan 2, 2023 • edited Loading

phungleson commented Feb 6, 2023

thekp commented Feb 12, 2023

nikicat commented Feb 12, 2023

draganpazin commented Apr 5, 2023

ayroblu commented Apr 6, 2023 • edited Loading

matthias-ccri commented Apr 25, 2023 • edited Loading

GuilleDF commented May 22, 2023 • edited Loading

mscottford commented Jun 5, 2023 via email

deviantintegral commented Jun 21, 2023

gselsidi commented Jun 21, 2023

deviantintegral commented Jun 21, 2023

pastelsky commented Jul 27, 2023 • edited Loading

mfucci-medable commented Nov 8, 2023

viktor-avd commented Jan 26, 2024 • edited Loading

From maintainers

deviantintegral commented Jan 27, 2024

ferrata commented Dec 10, 2024 • edited Loading

aslushnikov commented Aug 12, 2021 •

edited

Loading

aslushnikov commented Aug 12, 2021 •

edited

Loading

damaon commented Dec 6, 2021 •

edited

Loading

z0n commented Dec 6, 2021 •

edited

Loading

nikicat commented Jan 2, 2023 •

edited

Loading

ayroblu commented Apr 6, 2023 •

edited

Loading

matthias-ccri commented Apr 25, 2023 •

edited

Loading

GuilleDF commented May 22, 2023 •

edited

Loading

pastelsky commented Jul 27, 2023 •

edited

Loading

viktor-avd commented Jan 26, 2024 •

edited

Loading

ferrata commented Dec 10, 2024 •

edited

Loading