-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update jsonschema to 0.28.1, support drafts 2019-09 and 2020-12 #730
base: master
Are you sure you want to change the base?
Conversation
self.cache | ||
.get_schema(url) | ||
.get_schema(&Url::parse(uri.as_str())?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since jsonschema now uses fluent_uri:Uri instead of url::Url, we convert the Uri back to Url to avoid refactoring other major areas of Taplo to use fluent_uri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I forked taplo
to implement it myself, but didn't manage to finish all the required changes there.
I've added a small note, but otherwise, everything looks great!
P.S. It is actually super useful to see how the new jsonschema
versions are used in the real world. Specifically seeing what is done to handle the absence of async retrieval gives me a feeling that this feature should be higher on my todo list than now.
} else { | ||
None | ||
} | ||
let errors = validator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize now that in 0.28
all the external schemas retrieving happens during the validator compilation, not during validation, hence it this logic should be moved to create_validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you elaborate? How has the behavior changed from what this comment used to describe? Should I wait until async retrieval is implemented in jsonschema as you mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! jsonschema
used to retrieve external documents on demand, during the validate
call (or is_valid
or apply
) in order to perform the validation for $ref
and related reference keywords. To make it work, the $ref
implementation had some caching in place which implied a certain performance penalty (rwlock + inserts).
Some time ago I realized that it could be done just once when we create a validator (e.g. via jsonschema::validator_for
) and all that caching won't be needed during validation as we'll have everything already retrieved, so I moved all that retrieval logic to the building stage.
I think it is not necessary to wait for async retrieval in jsonschema
, as the loop approach would still work - it just needs to be applied inside create_validator
(which requires making a few extra functions async here). If there are any external schemas missing, then the call to ValidationOptions::build
will fail with the same validation errors as implemented in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. the crucial change to this comment is that it is not about reaching some validation path, but rather discovering an external referencing during the building process.
Because all the resolving became eager and now happens much earlier in the process.
Here is also a plan I have for async retrieval:
0.29
will haveValidationOptions::with_registry
that will expect a registry with all the needed schemas- This
jsonschema::Registry
will have an async constructor that will accept resources and traverse them all until all the external schemas are retrieved and all the inner indices are built.
So, the retrieval will happen in batches where each batch will load previously unseen external resources that are reachable right now. Then the output of this batch will be traversed in the same way and if there are some new resources, there will be another batch, and so on.
Practically speaking:
let registry = jsonschema::non_blocking::Registry::new("http://127.0.0.1/dependent.json", json!({"$ref": "http://127.0.0.1/another.json"})).await?;
let validator = jsonschema::options()
.with_registry(registry)
.build({"$ref": "http://127.0.0.1/dependent.json"})?;
And maybe there could be just an async builder
let validator = jsonschema::non_blocking::options()
.build({"$ref": "http://127.0.0.1/dependent.json"})
.await?;
// Or
let validator = jsonschema::non_blocking::validator_for({"$ref": "http://127.0.0.1/dependent.json"})
.await?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time ago I realized that it could be done just once when we create a validator (e.g. via
jsonschema::validator_for
) and all that caching won't be needed during validation as we'll have everything already retrieved, so I moved all that retrieval logic to the building stage.
If I map over the errors from the initial validator build, will all of the retrieval errors surface at once or do I still need to continually loop and rebuild the validator until each and every remote reference is resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to loop because external schemas can have references to other external schemas too and we don't know it until we load them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With jsonschema==0.29
& the resolve-async
feature enabled, the following should be enough on the validation build step:
jsonschema::async_options()
.with_retriever(CacheSchemaRetriever {
cache: self.cache().clone(),
})
.with_format("semver", formats::semver)
.with_format("semver-requirement", formats::semver_req)
.build(schema)
.await
.map_err(|err| anyhow!("invalid schema: {err}"))
- retriever impl requires a few extra changes - using
async fn
and caching the retrieved value.
Generally, the built-in retriever could work, but as I see here the cache is shared for all validators. I think in the future it could be done by having a shared Registry
to avoid re-processing already processed external resources, but there is some extra work to be done on the jsonschema
side. Right now I think the new jsonschema
version should already simplify the logic quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about the overall validation flow, the current approach won't work
Closes #497, jsonschema now supports both of drafts (#497 (comment)).