Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: better support for visual regression testing #8161

Open
1 of 3 tasks
aslushnikov opened this issue Aug 12, 2021 · 65 comments
Open
1 of 3 tasks

feat: better support for visual regression testing #8161

aslushnikov opened this issue Aug 12, 2021 · 65 comments

Comments

@aslushnikov
Copy link
Collaborator

aslushnikov commented Aug 12, 2021

Playwright Test has a built-in toMatchSnapshot() method to power Visual Regression Testing (VRT).

However, VRT is still challenging due to variances in the host environments. There's a bunch of measures we can do right away to drastically improve experience in @playwright/test

  • support for docker test fixture to run browsers inside docker image.
  • support for blur in matching snapshot to counteract antialiasing
  • better UI for reviewing snapshot diffs

Interesting context:

@florianbepunkt
Copy link

I think https://github.com/americanexpress/jest-image-snapshot provides a nice suite of options for various VRT scenarios. Test scenarios vary widely, depending on the context (testing components, whole pages, text-heavy or not, etc).

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible. Alternative image comparison algorithms could be left to userland, if they can be plugged into toMatchSnapshot via a common interface.

@aslushnikov
Copy link
Collaborator Author

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible.

@florianbepunkt What's SSM? Is it structural similarity measurement (SSIM)?

@florianbepunkt
Copy link

@aslushnikov Yes, typo.

@aslushnikov aslushnikov self-assigned this Aug 12, 2021
@kevinmpowell
Copy link

Solid integration with Storybook would be beneficial for the work I do. Chromatic and Percy do this really well.

Also a UI for reviewing the diffs would be great.

@aslushnikov
Copy link
Collaborator Author

aslushnikov commented Aug 12, 2021

Also a UI for reviewing the diffs would be great.

@kevinmpowell What's the one that you find most handy? Is it a "slider" diff like here:

image

@kevinmpowell
Copy link

I actually prefer the pixel highlighting (like Playwright already does), but organize all the failing tests in a UI so I can see what failed without having to poke around three different images.

Also being able to A/B toggle the baseline and the test image is nice in some cases.

@kevinmpowell
Copy link

Slider is rarely useful for me. An onion-skin (transparency overlay) would be more useful.

@AlexNetman
Copy link

@aslushnikov Why toMatchSnapshot() is not available in the documentation?
It can not be found in API list.
And the article that was in 1.13 https://playwright.dev/python/docs/1.13.0/test-snapshots
is not available for 1.14 anymore.

Thanks for thinking about Visual Regression testing. Thats important!

@florianbepunkt
Copy link

On a related note: It would be great if tests could be run cross-plattform. Currently the os platform name is baked into the snapshot filename, so our CI tests sometime fail due to name miss-match. #7575

@lo1tuma
Copy link

lo1tuma commented Aug 18, 2021

support for blur in matching snapshot to counteract antialiasing

It would be nice if we could choose whether we want to apply such image filters before the snapshot is being saved or only when doing the comparison. I would prefer the first option as it keeps the diff small when creating new snapshots even of such images that change randomly / are flaky.

@ts-23
Copy link

ts-23 commented Aug 31, 2021

Please allow an auto-generated filename when toMatchSnapshot has no name input, similar to how toMatchSnapshot works in Jest.

  • Auto-gen filename when name not specified for toMatchSnapShot
  • Set default toMatchSnapshot file extension in playwright.config.ts

E.g.

// foo.spec.ts
toMatchSnapshot() => foo.spec.ts.snap (default extension customizable in playwright.config.ts)

When you have a lot of screenshot assertions in one file, we can avoid writing a lot of filename inputs:
image

@sergioariveros
Copy link

Thanks for thinking on this, blur feature is something that will help us, we have something similar before with puppeter that help us to do comparisson in animated pages, in addition to that something that can be really useful is be able to ignore specific parts of the screen, specially in those parts where we have more dynamic data(videos/images)

@Doug-Bowen
Copy link

Blur would help us greatly. Also, the slider view would be incredible as well.

@z0n
Copy link

z0n commented Nov 17, 2021

We're also really interested in these improvements. We had to disable visual tests for now because they are randomly failing because a few pixels are off, even when increasing the threshold. Blur should help here hopefully.

@damaon
Copy link

damaon commented Dec 6, 2021

I suggest solving biggest pain-point which is how to store this stuff in git repo so it doesn't blow up in size (to store only last snapshot). Git LFS kinda works but it's painful. Maybe something else would work better? For reference: americanexpress/jest-image-snapshot#92

Would be great if these snapshot dirs were automatically marked in git to only store last revision.

@z0n
Copy link

z0n commented Dec 6, 2021

We're using Git LFS, what's your issue with it? Once we had it set up for everyone (we're using Mac, Windows and Linux), it worked fine. We're storing all images in the repo using Git LFS (*.png) so there's no work involved when adding snapshots to new tests either.

The only issue I have is comparing the image diffs in VS Code when committing new images as the old image is not shown in the diff view. The diff is working fine in the GitLab merge request view though so that's not a big issue.

@aslushnikov
Copy link
Collaborator Author

@gselsidi thank you for the sample!

@gselsidi
Copy link

I'll try to get some more as they come along, but i noticed the above occurs when taking snapshots of individual elements as opposed to the whole page. The whole page I'm able to use .0001 pixel ratio.

@gselsidi
Copy link

linking this here incase it applies:

#19417

@nikicat
Copy link

nikicat commented Jan 2, 2023

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it's used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots:

--font-render-hinting=none
--disable-skia-runtime-opts
--disable-font-subpixel-positioning
--disable-lcd-text

@phungleson
Copy link

We have similar issues with webkit on mac around emojis, I am not sure if we can provide further information to make debugging/fixing the issue easier?

It looks like mask is not available to configure at PlaywrightTestConfig level?

@thekp
Copy link

thekp commented Feb 12, 2023

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it's used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots:
--font-render-hinting=none
--disable-skia-runtime-opts
--disable-font-subpixel-positioning
--disable-lcd-text

@nikicat Where exactly did you add these flags in your playwright project?

@nikicat
Copy link

nikicat commented Feb 12, 2023

@thekp I pass them to playwright.chromium.launch(args=[...]) here
(chromium_flags() function is overriden inside a test).

@draganpazin
Copy link

We test screenshots off add-in the web MS Office 365 Excel. In some cases, size of add-in is 1px bigger than original. It seems we cannot control it. MS Office decides for this and is not deterministc. Image diff is negligible, and we could ignore it, but since size of image do not match toMatchSnapshot fails.
Currently we do not have good workaround for that problem.

I would vote for toMatchSnapshot be able to compare images of different size.

@ayroblu
Copy link
Contributor

ayroblu commented Apr 6, 2023

One of the things we noticed is that our focus is somewhat different between runs (imagine loading a page with a text input, sometimes the text input is focused, sometimes it isn't).

Wondering if there's anything we can do to improve reliability apart from just manually blurring and focusing

@matthias-ccri
Copy link

matthias-ccri commented Apr 25, 2023

My discrepancy was resolved by passing the --disable-remote-fonts flag to chromium.

    projects: [
        {
            name: 'chromium',
            use: {
                ...devices['Desktop Chrome'],
                launchOptions: {
                    args: [
                        // Configure text rendering so there's no difference between headless and headed (when debugging).
                        '--font-render-hinting=none',
                        '--disable-skia-runtime-opts',
                        '--disable-system-font-check',
                        '--disable-font-subpixel-positioning',
                        '--disable-lcd-text',
                        '--disable-remote-fonts',
                    ],
                },
            },
        },
    ],

@GuilleDF
Copy link

GuilleDF commented May 22, 2023

Hi, posting a flaky screenshot due to font rendering:

The baseline creation and the test run were both done on the mcr.microsoft.com/playwright:v1.28.0-focal docker image, on mobile safari (device preset is iPad (gen 7)).

Expected
Actual
Diff

@mscottford
Copy link

mscottford commented Jun 5, 2023 via email

@deviantintegral
Copy link

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture:
a-basic-page-with-embed-images-ID-2008-1-expected

And this one:

a-basic-page-with-embed-images-ID-2008-1-actual

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

a-basic-page-with-embed-images-ID-2008-1-diff

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

@gselsidi
Copy link

I've noticed an issue with webkit image rendering where it doesn't seem to be consistent. Look at the image of the flowers in this picture: a-basic-page-with-embed-images-ID-2008-1-expected

And this one:

a-basic-page-with-embed-images-ID-2008-1-actual

There's exactly 30 pixels different - and what's interesting is that when it fails, it's always 30 pixels.

a-basic-page-with-embed-images-ID-2008-1-diff

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn't consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We're already waiting on the complete property so JavaScript and playwright consider the image loaded.

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control.

At the end I just gave up at 100% pixel perfection and allowed a % of variance.

Even 95% accuracy is still a feat on its own, in reality you’ll probably get to 99.9% good enough.

but yeah would be cool to get 100% so we know if things degrade in the future we degrade from 100% as opposed to a starting point of 95%

also make sure you use docker if not always running against the actual same physical machine

@deviantintegral
Copy link

There are a few tickets with all the visual stuff, if I remember correctly there is an issue with WebKit rendering outside of playwrights control.
At the end I just gave up at 100% pixel perfection and allowed a % of variance.

nods yeah, that's what I figured. I'm currently working around it by setting maxDiffPixels if the browserName is webkit. Hopefully we can maintain 100% pixel coverage in Chrome or Firefox.

also make sure you use docker if not always running against the actual same physical machine

Good reminder. We're doing that with https://github.com/deviantintegral/ddev-playwright and the above screenshots are from running tests in a loop until they fail, all in the same environment.

@pastelsky
Copy link
Contributor

pastelsky commented Jul 27, 2023

There's a separate (related) issue regarding adding support for docker at #20954 so that visual regression tests can run in a consistent environment and environment-related differences are negated.

It would be helpful to receive upvotes there from folks here if that's something you need.

@mfucci-medable
Copy link

I am encountering the same issue with chromium (and webkit at an even higher frequency, too high so we disabled it).

Version: Playwright 1.38.1 (but the issue is reproducible as well in 1.39.0)
Env: running in ubuntu:jammy on an Apple M1 Pro (but the issue happens in our Linux CI pipeline as well, running in docker makes it pixel perfect between local and CI)
What happens: About 5% of the time, randomly one letter is incorrectly positionned, always the same letter. On other screenshots, it might 2 -3 letters, sometime in the middle of a word.
More info:

  • No network call, the css is inlined before the HTML.

  • Using chromium arguments (no improvements before / after enabling those arguments):

            '--font-render-hinting=none',
            '--disable-skia-runtime-opts',
            '--disable-system-font-check',
            '--disable-font-subpixel-positioning',
            '--disable-lcd-text',
            '--disable-remote-fonts',
    

My guess: this issue never happens on other screenshots that we are taking using exactly the same configuration, so it has to do with something in the HTML / CSS (that I am probably not allowed to share here)...

Actual / expected / diff (triggering here on the pseudo-locale test but might happen as well on the en-US version):

actual
expected
diff

@viktor-avd
Copy link

viktor-avd commented Jan 26, 2024

From maintainers

Hey folks! if you have examples of PNG screenshots that are taken on the same browser and same OS yet are different due to anti-aliasing issues, could you please attach the "expected", "actual" and "diff" images here?

This information will help with our experiments with fighting browser rendering non-determinism.

Hi, this appeared with the latest version, nothing like this happened before with the same code and configuration.

Playwright version: 1.41.1
Docker image: mcr.microsoft.com/playwright:v1.41.1-jammy
Chrome without args / the same with next args:

'--disable-skia-runtime-opts',
'--disable-system-font-check',
'--disable-remote-fonts',
'--font-render-hinting=none',
'--disable-font-subpixel-positioning',

Example 1:

expected
expected

actual
actual

diff
diff

Example 2:

expected
actual-rule

actual
expected-rule

diff
diff-rule

@deviantintegral
Copy link

An update from our experiences above: We found that increasing maxDiffPixels (or maxDiffPixelRatio) to a level that could avoid false failures also led to too many regressions slipping through visual comparisons. However, the threshold option as documented https://playwright.dev/docs/api/class-pageassertions#page-assertions-to-have-screenshot-2 worked for us. Once we increased that from the default 0.2 to 0.3, we've had no false failures or missed regressions.

@ferrata
Copy link

ferrata commented Dec 10, 2024

Hey folks! Any updates on this feature? 😅

I am facing the same behavior after switching to a recent Playwright version (1.49.0).

Diff
Actual
Expected

FWIW, it was working correctly on the Playwright version 1.41.2 with the PLAYWRIGHT_CHROMIUM_USE_HEADLESS_NEW variable on.

Thanks!

UPDATE:

Sorry, folks, I am an idiot 😀
It all works great with the new chromium channel as described in the 1.49 release note:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  projects: [
    {
      name: 'chromium',
      use: { ...devices['Desktop Chrome'], channel: 'chromium' },
    },
  ],
});

So it all works for me now!

kary-ajrj pushed a commit to kary-ajrj/playwright-demo that referenced this issue Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests