Skip to content

SchemaError: '^[^\\/~\\^\\: \\[\\]\\\\]+(\\/[^\\/~\\^\\: \\[\\]\\\\]+)*$' is not a 'regex' #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Borda opened this issue Apr 3, 2025 · 7 comments

Comments

@Borda
Copy link
Contributor

Borda commented Apr 3, 2025

Hello, recently we started to see the following validation error on workflows that are running correctly

.azure/gpu-integrations.yml
Error: schemafile was not valid

SchemaError: '^[^\\/~\\^\\: \\[\\]\\\\]+(\\/[^\\/~\\^\\: \\[\\]\\\\]+)*$' is not a 'regex'

Failed validating 'format' in metaschema['properties']['definitions']['additionalProperties']['properties']['pattern']:
    {'type': 'string', 'format': 'regex'}

On schema['definitions']['branchFilter']['pattern']:
    '^[^\\/~\\^\\: \\[\\]\\\\]+(\\/[^\\/~\\^\\: \\[\\]\\\\]+)*$'
  in "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/check_jsonschema/checker.py", line 56
  >>> return self._schema_loader.get_validator(

with the latest release 0.31.3 and the schema is https://github.com/Lightning-AI/torchmetrics/blob/master/.azure/gpu-integrations.yml

ref: https://github.com/Lightning-AI/torchmetrics/actions/runs/14233511381/job/39888573782?pr=3038

@Borda
Copy link
Contributor Author

Borda commented Apr 3, 2025

the problem uses the wrong validator, azure can't use default but nonunicode

so with CLI use ... --regex-variant="nonunicode"

@sirosen
Copy link
Member

sirosen commented Apr 7, 2025

Sorry I wasn't able to reply earlier; I see that you've been working through this in the past few days.
It looks to me like you've already updated your shared workflow which has the check to use --regex-variant ( https://github.com/Lightning-AI/utilities/blob/647eeea90f6e982aa5163b6715d5fa40b5a3dc98/.github/workflows/check-schema.yml#L93 ). I'm not sure if there's more to do on this issue?

I think this raises some questions for me about how I document changes, rather than what is or is not supported. This was known to me as the check-jsonschema maintainer, but not well communicated in the changelog.

In #511 I implemented the new regex mode and customized the Azure hook defined in the hook catalog to use --regex-variant=nonunicode. The changelog entry for v0.31.0 focuses on the new behavior but doesn't mention the Azure hook change.

Changes like the one in v0.31.0 are the reason that check-jsonschema is still in 0-versioned -- I foresee a few more major changes before I can call it 1.0 (I have some goals in mind for that, but it's been hard to find time to work on them).

@Borda
Copy link
Contributor Author

Borda commented Apr 7, 2025

I implemented the new regex mode and customized the Azure hook defined in the hook catalog to use --regex-variant=nonunicode. The changelog entry for v0.31.0 focuses on the new behavior but doesn't mention the Azure hook change.

I think that the challenge is that the default configuration does not work, as a user would expect that using any builtin schema would work with default parsing but for Azure, it does not and the error does not help, so maybe wrap the error and if you use default and Azure tell the user he needs to use this nonunicode

@sirosen
Copy link
Member

sirosen commented Apr 7, 2025

I understand what you're saying, but I have a slightly different take. What you're requesting sounds to me like a reshaping of the CLI -- one which I'm fundamentally onboard with, but want to pursue via a different path.

Right now, no option implies another -- if you pass --builtin-schema ... then you also need to declare the data transform if you want to use that feature -- they're tied together in the hook config but not in the pure CLI usage mode.
If you want to use one of the builtin schemas in a way which exactly matches the relevant hook, you need to be looking at and probably copying down the hook config. That's suboptimal, but it's the current truth of the situation.

Suppose one of the schemas and files targeted require an additional accommodation. To get slightly silly about it, imagine a usage

check-jsonschema --builtin-schema gitlab-ci --special-gitlab-option-for-username-format-validation

I agree that we want a simple CLI usage which matches the relevant hook.
But should --builtin-schema gitlab-ci or --builtin-schema azure-pipelines imply other option values? I don't think so. That option is already part of a pretty overloaded part of the current CLI model, in which you select a mode of interaction and a schema at the same time, and option implication logic is easy to write but really hard to explain and document consistently and well. Even when well-documented, implicit relationships between components are harder for users to understand.

I need to get back to working on check-jsonschema subcommands, and then this would be some usage akin to (exact command name TBD)...

check-jsonschema hook azure-pipelines <path/to/pipeline.yaml>

That gives some pretty clear semantics and doesn't force us to document that, for example "--regex-variant defaults to unicode except when it doesn't..." Instead, each hook is at liberty to document what usage it wraps. It internalizes into the CLI the options which are currently being externalized in .pre-commit-hooks.yaml.

We can do this with --builtin-schema, but I'm inclined not to put effort into that since I consider it a dead-end in terms of the tool's longer term evolution.

@Borda
Copy link
Contributor Author

Borda commented Apr 7, 2025

I think wrapping this error with a suggestion to use another --regex-variant would be just fine

@sirosen
Copy link
Member

sirosen commented Apr 11, 2025

I don't think wrapping or replacing the error is simple to do in a way which isn't breaking/wrong for some other use-case.

This is a format validation failure ("format": "regex"), which means it's like any other JSON Schema validation failure. It could happen under an anyOf or similar, so it might not ultimately get reported back to the user. Wrapping the error would need some pretty clear criteria for when it's appropriate vs when it is not, and I don't believe it's possible to construct that correctly.

I agree that check-jsonschema could print a hint to the user when the best_match error is a regex validation failure, to note that --regex-variant exists. That could get people to the correct solution faster. I'm not sure how to wire it up offhand, so it will take some time.

@Borda
Copy link
Contributor Author

Borda commented Apr 14, 2025

I agree that check-jsonschema could print a hint to the user when the best_match error is a regex validation failure, to note that --regex-variant exists. That could get people to the correct solution faster. I'm not sure how to wire it up offhand, so it will take some time.

I think I can help / add it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants