Skip to content

Commit fee0296

Browse files
committed
Clarify various things about canonical URIs
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
1 parent 6008145 commit fee0296

File tree

1 file changed

+77
-64
lines changed

1 file changed

+77
-64
lines changed

jsonschema-core.xml

+77-64
Original file line numberDiff line numberDiff line change
@@ -315,8 +315,8 @@
315315
of five categories:
316316
<list style="hanging">
317317
<t hangText="identifiers:">
318-
control schema identification through setting the schema's
319-
canonical URI and/or changing how the base URI is determined
318+
control schema identification through setting a URI
319+
for the schema and/or changing how the base URI is determined
320320
</t>
321321
<t hangText="assertions:">
322322
produce a boolean result when applied to an instance
@@ -419,7 +419,9 @@
419419
<t>
420420
A JSON Schema resource is a schema which is
421421
<xref target="RFC6596">canonically</xref> identified by an
422-
<xref target="RFC3986">absolute URI</xref>.
422+
<xref target="RFC3986">absolute URI</xref>. Schema resources MAY
423+
also be identified by URIs including fragments. Any such URIs
424+
are considered to be non-canonical.
423425
</t>
424426
<t>
425427
The root schema is the schema that comprises the entire JSON document
@@ -723,9 +725,9 @@
723725
be able to support those keywords or vocabularies that contain them.
724726
</t>
725727
</section>
726-
<section title="Identifiers" anchor="identifiers">
728+
<section title="Identifiers">
727729
<t>
728-
Identifiers set the canonical URI of a schema, or affect how such URIs are
730+
Identifiers define URIs for a schema, or affect how such URIs are
729731
resolved in <xref target="references">references</xref>, or both.
730732
The Core vocabulary defined in this document defines several
731733
identifying keywords, most notably "$id".
@@ -1333,26 +1335,31 @@
13331335
<t>
13341336
If present, the value for this keyword MUST be a string, and MUST represent a
13351337
valid <xref target="RFC3986">URI-reference</xref>. This URI-reference
1336-
SHOULD be normalized, and MUST resolve to an
1337-
<xref target="RFC3986">absolute-URI</xref> (without a fragment). Therefore,
1338-
"$id" MUST NOT contain a non-empty fragment, and SHOULD NOT contain an
1339-
empty fragment.
1338+
SHOULD be normalized, and MUST be semantically equivalent to an
1339+
<xref target="RFC3986">absolute-URI</xref> (without a fragment).
13401340
</t>
13411341
<t>
1342-
Since an empty fragment in the context of the application/schema+json media
1343-
type refers to the same resource as the base URI without a fragment,
1344-
an implementation MAY normalize a URI ending with an empty fragment by removing
1345-
the fragment. However, schema authors SHOULD NOT rely on this behavior
1346-
across implementations.
1342+
The application/schema+json media type defines that an absolute-URI
1343+
identifying a resource and the same URI with an empty fragment
1344+
appended (which identifies the resource's root schema object) are
1345+
semantically equivalent. Since this semantic equivalence is not part
1346+
of the <xref target="RFC3986">RFC 3986 normalization process</xref>,
1347+
implementors and schema authors cannot rely on generic URI libraries
1348+
understanding the equivalence.
1349+
</t>
1350+
<t>
1351+
Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT
1352+
contain an empty fragment. The absolute-URI form MUST be considered
1353+
the canonical URI, regardless of the presence or absence of an empty fragment.
13471354
<cref>
1348-
This is primarily allowed because older meta-schemas have an empty
1349-
fragment in their $id (or previously, id). A future draft may outright
1350-
forbid even empty fragments in "$id".
1355+
An empty fragment is currently allowed because older meta-schemas have
1356+
an empty fragment in their $id (or previously, id).
1357+
A future draft may outright forbid even empty fragments in "$id".
13511358
</cref>
13521359
</t>
13531360
<t>
1354-
This URI also serves as the base URI for relative URI-references in keywords
1355-
within the schema resource, in accordance with
1361+
The absolute-URI also serves as the base URI for relative URI-references
1362+
in keywords within the schema resource, in accordance with
13561363
<xref target="RFC3986">RFC 3986 section 5.1.1</xref> regarding base URIs
13571364
embedded in content.
13581365
</t>
@@ -1616,7 +1623,7 @@
16161623
media type.
16171624
</t>
16181625
<t>
1619-
Unless the "$id" keyword described in the next section is present in the
1626+
Unless the "$id" keyword described in an earlier section is present in the
16201627
root schema, this base URI SHOULD be considered the canonical URI of the
16211628
schema document's root schema resource.
16221629
</t>
@@ -1743,7 +1750,7 @@
17431750
Since JSON Pointer URI fragments are constructed based on the structure
17441751
of the schema document, an embedded schema resource and its subschemas
17451752
can be identified by JSON Pointer fragments relative to either its own
1746-
canonical URI, or relative to the containing resource's URI.
1753+
canonical URI, or relative to a containing resource's URI.
17471754
</t>
17481755
<t>
17491756
Conceptually, a set of linked schema resources should behave
@@ -1775,13 +1782,18 @@
17751782
}
17761783
]]>
17771784
</artwork>
1778-
<postamble>
1779-
The URI "https://example.com/foo#/items/additionalProperties"
1780-
points to the schema of the "additionalProperties" keyword in
1781-
the embedded resource. The canonical URI of that schema, however,
1782-
is "https://example.com/bar#/additionalProperties".
1783-
</postamble>
17841785
</figure>
1786+
<t>
1787+
The URI "https://example.com/foo#/items" points to the "items" schema,
1788+
which is an embedded resource. The canonical URI of that schema
1789+
resource, however, is "https://example.com/bar".
1790+
</t>
1791+
<t>
1792+
For the "additionalProperties" schema within that embedded resource,
1793+
the URI "https://example.com/foo#/items/additionalProperties" points
1794+
to the correct object, but that object's URI relative to its resource's
1795+
canonical URI is "https://example.com/bar#/additionalProperties".
1796+
</t>
17851797
<figure>
17861798
<preamble>
17871799
Now consider the following two schema resources linked by reference
@@ -1803,24 +1815,25 @@
18031815
]]>
18041816
</artwork>
18051817
<postamble>
1806-
Here we see that the canonical URI for that "additionalProperties"
1807-
subschema is still valid, while the non-canonical URI with the fragment
1808-
beginning with "#/items/$ref" now resolves to nothing.
1818+
Here we see that the URI for the "additionalProperties" schema object
1819+
that is relative to its resource's canonical URI is still valid,
1820+
while the URI relative to the "items" schema object's URI no longer
1821+
resolves to anything.
18091822
</postamble>
18101823
</figure>
18111824
<t>
18121825
Note also that "https://example.com/foo#/items" is valid in both
18131826
arrangements, but resolves to a different value. This URI ends up
1814-
functioning similarly to a retrieval URI for a resource. While valid,
1815-
examining the resolved value and either using the "$id" (if the value
1816-
is a subschema), or resolving the reference and using the "$id" of the
1817-
reference target, is preferable.
1827+
functioning similarly to a retrieval URI for a resource. While this URI
1828+
is valid, it is more robust to use the "$id" of the embedded or referenced
1829+
resource unless it is specifically desired to identify the object containing
1830+
the "$ref" in the second (non-embedded) arrangement.
18181831
</t>
18191832
<t>
1820-
An implementation MAY choose not to support addressing schemas
1821-
by non-canonical URIs. As such, it is RECOMMENDED that schema authors only
1822-
use canonical URIs, as using non-canonical URIs may reduce
1823-
schema interoperability.
1833+
An implementation MAY choose not to support addressing schema resource
1834+
contents by URIs using a base other than the resource's canonical URI,
1835+
plus a JSON Pointer fragment relative to that base. Therefore, schema
1836+
authors SHOULD NOT rely on such URIs, as using them may reduce interoperability.
18241837
<cref>
18251838
This is to avoid requiring implementations to keep track of a whole
18261839
stack of possible base URIs and JSON Pointer fragments for each,
@@ -1832,9 +1845,9 @@
18321845
</cref>
18331846
</t>
18341847
<t>
1835-
Further examples of such non-canonical URIs, as well as the appropriate
1836-
canonical URIs to use instead, are provided in appendix
1837-
<xref target="idExamples" format="counter"></xref>.
1848+
Further examples of such non-canonical URI construction, as well as
1849+
the appropriate canonical URI-based fragments to use instead,
1850+
are provided in appendix <xref target="idExamples" format="counter"></xref>.
18381851
</t>
18391852
</section>
18401853
</section>
@@ -2695,8 +2708,8 @@
26952708
<section title="Keyword Absolute Location">
26962709
<t>
26972710
The absolute, dereferenced location of the validating keyword. The value MUST
2698-
be expressed as a full URI using the canonical URI of the relevant
2699-
schema object, and it MUST NOT include by-reference applicators
2711+
be expressed as a full URI using the canonical URI of the relevant schema resource
2712+
with a JSON Pointer fragment, and it MUST NOT include by-reference applicators
27002713
such as "$ref" or "$dynamicRef" as non-terminal path components.
27012714
It MAY end in such keywords if the error or annotation is for that
27022715
keyword, such as an unresolvable reference.
@@ -3332,76 +3345,76 @@ https://example.com/schemas/common#/$defs/count/minimum
33323345
<list style="hanging">
33333346
<t hangText="# (document root)">
33343347
<list style="hanging">
3335-
<t hangText="canonical absolute-URI (and also base URI)">
3348+
<t hangText="canonical (and base) URI">
33363349
https://example.com/root.json
33373350
</t>
3338-
<t hangText="canonical URI with pointer fragment">
3351+
<t hangText="canonical resource URI plus pointer fragment">
33393352
https://example.com/root.json#
33403353
</t>
33413354
</list>
33423355
</t>
33433356
<t hangText="#/$defs/A">
33443357
<list>
33453358
<t hangText="base URI">https://example.com/root.json</t>
3346-
<t hangText="canonical URI with plain fragment">
3359+
<t hangText="canonical resource URI plus plain fragment">
33473360
https://example.com/root.json#foo
33483361
</t>
3349-
<t hangText="canonical URI with pointer fragment">
3362+
<t hangText="canonical resource URI plus pointer fragment">
33503363
https://example.com/root.json#/$defs/A
33513364
</t>
33523365
</list>
33533366
</t>
33543367
<t hangText="#/$defs/B">
33553368
<list style="hanging">
3356-
<t hangText="base URI">https://example.com/other.json</t>
3357-
<t hangText="canonical URI with pointer fragment">
3369+
<t hangText="canonical (and base) URI">https://example.com/other.json</t>
3370+
<t hangText="canonical resource URI plus pointer fragment">
33583371
https://example.com/other.json#
33593372
</t>
3360-
<t hangText="non-canonical URI with fragment relative to root.json">
3373+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33613374
https://example.com/root.json#/$defs/B
33623375
</t>
33633376
</list>
33643377
</t>
33653378
<t hangText="#/$defs/B/$defs/X">
33663379
<list style="hanging">
33673380
<t hangText="base URI">https://example.com/other.json</t>
3368-
<t hangText="canonical URI with plain fragment">
3381+
<t hangText="canonical resource URI plus plain fragment">
33693382
https://example.com/other.json#bar
33703383
</t>
3371-
<t hangText="canonical URI with pointer fragment">
3384+
<t hangText="canonical resource URI plus pointer fragment">
33723385
https://example.com/other.json#/$defs/X
33733386
</t>
3374-
<t hangText="non-canonical URI with fragment relative to root.json">
3387+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33753388
https://example.com/root.json#/$defs/B/$defs/X
33763389
</t>
33773390
</list>
33783391
</t>
33793392
<t hangText="#/$defs/B/$defs/Y">
33803393
<list style="hanging">
3381-
<t hangText="base URI">https://example.com/t/inner.json</t>
3382-
<t hangText="canonical URI with plain fragment">
3394+
<t hangText="canonical (and base) URI">https://example.com/t/inner.json</t>
3395+
<t hangText="canonical URI plus plain fragment">
33833396
https://example.com/t/inner.json#bar
33843397
</t>
3385-
<t hangText="canonical URI with pointer fragment">
3398+
<t hangText="canonical URI plus pointer fragment">
33863399
https://example.com/t/inner.json#
33873400
</t>
3388-
<t hangText="non-canonical URI with fragment relative to other.json">
3401+
<t hangText="base URI of enclosing (other.json) resource plus fragment">
33893402
https://example.com/other.json#/$defs/Y
33903403
</t>
3391-
<t hangText="non-canonical URI with fragment relative to root.json">
3404+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33923405
https://example.com/root.json#/$defs/B/$defs/Y
33933406
</t>
33943407
</list>
33953408
</t>
33963409
<t hangText="#/$defs/C">
33973410
<list style="hanging">
3398-
<t hangText="base URI">
3411+
<t hangText="canonical (and base) URI">
33993412
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f
34003413
</t>
3401-
<t hangText="canonical URI with pointer fragment">
3414+
<t hangText="canonical URI plus pointer fragment">
34023415
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#
34033416
</t>
3404-
<t hangText="non-canonical URI with fragment relative to root.json">
3417+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
34053418
https://example.com/root.json#/$defs/C
34063419
</t>
34073420
</list>
@@ -3433,16 +3446,16 @@ https://example.com/schemas/common#/$defs/count/minimum
34333446
<t>
34343447
This transformation can be safely and reversibly done as long as
34353448
all static references (e.g. "$ref") use URI-references that resolve
3436-
to canonical URIs, and all schema resources have an absolute-URI
3437-
as the "$id" in their root schema.
3449+
to URIs using the canonical resource URI as the base, and all schema
3450+
resources have an absolute-URI as the "$id" in their root schema.
34383451
</t>
34393452
<t>
34403453
With these conditions met, each external resource can be copied
34413454
under "$defs", without breaking any references among the resources'
34423455
schema objects, and without changing any aspect of validation or
34433456
annotation results. The names of the schemas under "$defs" do
34443457
not affect behavior, assuming they are each unique, as they
3445-
do not appear in canonical URIs for the embedded resources.
3458+
do not appear in the canonical URIs for the embedded resources.
34463459
</t>
34473460
</section>
34483461
<section title="Reference removal is not always safe">

0 commit comments

Comments
 (0)