You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 17, 2024. It is now read-only.
Is your feature request related to a problem? Please describe.
We run data-diff for many tables. Sometimes there are a lot of differences between the diffed tables. If so, the data diff for this tablepair might take a very long time (multiple hours). I prefer to skip this diff at a certain point, e.g., when a maximum diff time or # different records is exceeded. For such a diff, I do not care which records differ precisely, I am ok with knowing that this table is very off.
Describe the solution you'd like
Define a:
maximum diff time
OR, a maximum # different records
OR, a maximum % different records
If this threshold is exceeded, the diff is aborted, with a WARNING or ERROR message, and maybe an Exception.
Describe alternatives you've considered
I run data-diff programmatically and built this feature myself in the Python script that calls data-diff. This did not work as I hoped because data-diff uses a ThreadPool that continued with the diff after I broke out of the diff_tables iterable.
Additional context
filipegan, lucasasmith, KvGeffen, westeun, PaulPreijer and 11 more