Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces data quality measures, based on validator results.
Measures are defined by ”exclusions”, which are references to specific types of validator errors. When a measure is calculated, and item is counted towards the measure total count unless is it is “excluded” by one of the validator errors referenced by its “exclusions”.
For example:
In the above measure, an item will not be counted towards the total and percentage if the validator error “
MISSING_REQUIRED_FIELD
” is present for the target fields “name
”.The advantage of this approach is that the complex inheritance rules respected by the validator are implicitly considered, and that more complex validation rules such as activity list matching are easily included without any duplicated logic. Tests can easily be written for complex rules also, as the validator already provides a framework for this.
This increases maintainability, flexibility, and consistency of results across tools. The approach is also extensible, and encourages the creation of new data quality rules in the validator as data quality measures become more in-depth: this has the advantage of surfacing errors at a more detailed level within the various OA tools, as well as providing a high-level summary.
Measures are defined within “profiles”, which allows for subsets of measures to be defined distinctly for different use cases (e.g accessibility).
Measures are defined within this repository, so that they can be used within both the Validator GUI and the Test Suite, and be maintained alongside the validation rules on which they depend.
(Note that this PR is in draft, and requires some refactoring and tidying up before merging)
Screenshot of unstyled results below:
data:image/s3,"s3://crabby-images/2dbe0/2dbe0fbed7a10b481716eecaf89cc780239e22bd" alt="Screenshot 2023-04-05 at 09 52 21"
Open questions:
url
? (Less relevant for the current measures, which are mostly based on required fields)