Skip to content

Commit 57f8191

Browse files
authored
Some general improvements / discussion (#243)
* Validate extensions also for Collections * Use best_match to provide a more useful error message * Add schema-map to allow validating against local copies of schemas * Fix tests * Add basic test for schema map and collection extensions This also adds a schema property to make sure the validation message references the overridden schema and not the original one * Add test for schema map replacement of sub-schemas * Update changelog and readme * Keep all schemas in message when doing recursive validation * Fix recursive mode for collections not reporting errors correctly * Only show invalid items in recursive validation unless verbose mode is on * Bubble up regular Exceptions during recursive validation * Add test for recursive validation of multiple child collections * Fall back to core validation if no extensions are found * Remove unused fetch_remote_schema function * Add -s as alternative to --schema-map in CLI
1 parent 265ebd6 commit 57f8191

15 files changed

+924
-147
lines changed

CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ The format is (loosely) based on [Keep a Changelog](http://keepachangelog.com/)
88

99
### Added
1010

11+
- If a validation error occurs in recursive mode only show the invalid items unless verbose mode is on. [#243](https://github.com/stac-utils/stac-validator/pull/243)
12+
- Added ability to validate extensions of Collections [#243](https://github.com/stac-utils/stac-validator/pull/243)
13+
- Improve error reporting through use of [best_match](https://python-jsonschema.readthedocs.io/en/stable/errors/#best-match-and-relevance) [#243](https://github.com/stac-utils/stac-validator/pull/243)
14+
- Add `schema-map` option similar to [stac-node-validator SchemaMap](https://github.com/stac-utils/stac-node-validator?tab=readme-ov-file#usage) to allow validation against local copies of schemas. [#243](https://github.com/stac-utils/stac-validator/pull/243)
15+
1116
## [v3.5.0] - 2025-01-10
1217

1318
### Added

README.md

+76-28
Original file line numberDiff line numberDiff line change
@@ -91,34 +91,38 @@ stac-validator --help
9191
Usage: stac-validator [OPTIONS] STAC_FILE
9292

9393
Options:
94-
--core Validate core stac object only without extensions.
95-
--extensions Validate extensions only.
96-
--links Additionally validate links. Only works with
97-
default mode.
98-
--assets Additionally validate assets. Only works with
99-
default mode.
100-
-c, --custom TEXT Validate against a custom schema (local filepath or
101-
remote schema).
102-
-r, --recursive Recursively validate all related stac objects.
103-
-m, --max-depth INTEGER Maximum depth to traverse when recursing. Omit this
104-
argument to get full recursion. Ignored if
105-
`recursive == False`.
106-
--collections Validate /collections response.
107-
--item-collection Validate item collection response. Can be combined
108-
with --pages. Defaults to one page.
109-
--no-assets-urls Disables the opening of href links when validating
110-
assets (enabled by default).
111-
--header KEY VALUE HTTP header to include in the requests. Can be used
112-
multiple times.
113-
-p, --pages INTEGER Maximum number of pages to validate via --item-
114-
collection. Defaults to one page.
115-
-v, --verbose Enables verbose output for recursive mode.
116-
--no_output Do not print output to console.
117-
--log_file TEXT Save full recursive output to log file (local
118-
filepath).
119-
--version Show the version and exit.
120-
--help Show this message and exit.
121-
```
94+
--core Validate core stac object only without
95+
extensions.
96+
--extensions Validate extensions only.
97+
--links Additionally validate links. Only works with
98+
default mode.
99+
--assets Additionally validate assets. Only works
100+
with default mode.
101+
-c, --custom TEXT Validate against a custom schema (local
102+
filepath or remote schema).
103+
-s, --schema-map <TEXT TEXT>...
104+
Schema path to replaced by (local) schema
105+
path during validation. Can be used multiple
106+
times.
107+
-r, --recursive Recursively validate all related stac
108+
objects.
109+
-m, --max-depth INTEGER Maximum depth to traverse when recursing.
110+
Omit this argument to get full recursion.
111+
Ignored if `recursive == False`.
112+
--collections Validate /collections response.
113+
--item-collection Validate item collection response. Can be
114+
combined with --pages. Defaults to one page.
115+
--no-assets-urls Disables the opening of href links when
116+
validating assets (enabled by default).
117+
--header <TEXT TEXT>... HTTP header to include in the requests. Can
118+
be used multiple times.
119+
-p, --pages INTEGER Maximum number of pages to validate via
120+
--item-collection. Defaults to one page.
121+
-v, --verbose Enables verbose output for recursive mode.
122+
--no_output Do not print output to console.
123+
--log_file TEXT Save full recursive output to log file
124+
(local filepath).
125+
--help Show this message and exit.```
122126
123127
---
124128
@@ -340,3 +344,47 @@ stac-validator https://earth-search.aws.element84.com/v0/collections/sentinel-s2
340344
```bash
341345
stac-validator https://stac-catalog.eu/collections/sentinel-s2-l2a/items --header x-api-key $MY_API_KEY --header foo bar
342346
```
347+
348+
**--schema-map**
349+
Schema map allows stac-validator to replace a schema in a STAC json by a schema from another URL or local schema file.
350+
This is especially useful when developing a schema and testing validation against your local copy of the schema.
351+
352+
``` bash
353+
stac-validator https://raw.githubusercontent.com/radiantearth/stac-spec/master/examples/extended-item.json --extensions --schema-map https://stac-extensions.github.io/projection/v1.0.0/schema.json stac-validator https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/extended-item.json --extensions --schema-map https://stac-extensions.github.io/projection/v1.0.0/schema.json "tests/test_data/schema/v1.0.0/projection.json"
354+
[
355+
{
356+
"version": "1.0.0",
357+
"path": "https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/extended-item.json",
358+
"schema": [
359+
"https://stac-extensions.github.io/eo/v1.0.0/schema.json",
360+
"tests/test_data/schema/v1.0.0/projection.json",
361+
"https://stac-extensions.github.io/scientific/v1.0.0/schema.json",
362+
"https://stac-extensions.github.io/view/v1.0.0/schema.json",
363+
"https://stac-extensions.github.io/remote-data/v1.0.0/schema.json"
364+
],
365+
"valid_stac": true,
366+
"asset_type": "ITEM",
367+
"validation_method": "extensions"
368+
}
369+
]
370+
```
371+
372+
This option is also capable of replacing URLs to subschemas:
373+
374+
```bash
375+
stac-validator tests/test_data/v100/extended-item-local.json --custom tests/test_data/schema/v1.0.0/item_with_unreachable_url.json --schema-map https://geojson-wrong-url.org/schema/Feature.json https://geojson.org/schema/Feature.json --schema-map https://geojson-wrong-url.org/schema/Geometry.json https://geojson.org/schema/Geometry.json
376+
[
377+
{
378+
"version": "1.0.0",
379+
"path": "tests/test_data/v100/extended-item-local.json",
380+
"schema": [
381+
"tests/test_data/schema/v1.0.0/item_with_unreachable_url.json"
382+
],
383+
"valid_stac": true,
384+
"asset_type": "ITEM",
385+
"validation_method": "custom"
386+
}
387+
]
388+
```
389+
390+

stac_validator/stac_validator.py

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import json
22
import sys
3-
from typing import Any, Dict, List
3+
from typing import Any, Dict, List, Optional, Tuple
44

55
import click # type: ignore
66

@@ -87,6 +87,13 @@ def collections_summary(message: List[Dict[str, Any]]) -> None:
8787
default="",
8888
help="Validate against a custom schema (local filepath or remote schema).",
8989
)
90+
@click.option(
91+
"--schema-map",
92+
"-s",
93+
type=(str, str),
94+
multiple=True,
95+
help="Schema path to replaced by (local) schema path during validation. Can be used multiple times.",
96+
)
9097
@click.option(
9198
"--recursive",
9299
"-r",
@@ -149,6 +156,7 @@ def main(
149156
links: bool,
150157
assets: bool,
151158
custom: str,
159+
schema_map: List[Tuple],
152160
verbose: bool,
153161
no_output: bool,
154162
log_file: str,
@@ -170,6 +178,7 @@ def main(
170178
links (bool): Whether to additionally validate links. Only works with default mode.
171179
assets (bool): Whether to additionally validate assets. Only works with default mode.
172180
custom (str): Path to a custom schema file to validate against.
181+
schema_map (list(tuple)): List of tuples each having two elememts. First element is the schema path to be replaced by the path in the second element.
173182
verbose (bool): Whether to enable verbose output for recursive mode.
174183
no_output (bool): Whether to print output to console.
175184
log_file (str): Path to a log file to save full recursive output.
@@ -182,6 +191,10 @@ def main(
182191
or 1 if it is invalid.
183192
"""
184193
valid = True
194+
if schema_map == ():
195+
schema_map_dict: Optional[Dict[str, str]] = None
196+
else:
197+
schema_map_dict = dict(schema_map)
185198
stac = StacValidate(
186199
stac_file=stac_file,
187200
collections=collections,
@@ -196,6 +209,7 @@ def main(
196209
headers=dict(header),
197210
extensions=extensions,
198211
custom=custom,
212+
schema_map=schema_map_dict,
199213
verbose=verbose,
200214
log=log_file,
201215
)

stac_validator/utilities.py

+35-50
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@
55
from urllib.parse import urlparse
66
from urllib.request import Request, urlopen
77

8-
import jsonschema
98
import requests # type: ignore
109
from jsonschema import Draft202012Validator
1110
from referencing import Registry, Resource
1211
from referencing.jsonschema import DRAFT202012
13-
from referencing.retrieval import to_cached_resource
1412
from referencing.typing import URI
1513

1614
NEW_VERSIONS = [
@@ -192,88 +190,75 @@ def link_request(
192190
initial_message["format_invalid"].append(link["href"])
193191

194192

195-
def fetch_remote_schema(uri: str) -> dict:
193+
def cached_retrieve(uri: URI, schema_map: Optional[Dict] = None) -> Resource[Dict]:
196194
"""
197-
Fetch a remote schema from a URI.
195+
Retrieve and cache a remote schema.
198196
199197
Args:
200-
uri (str): The URI of the schema to fetch.
198+
uri (str): The URI of the schema.
199+
schema_map_keys: Override schema location to validate against local versions of a schema
201200
202201
Returns:
203-
dict: The fetched schema content as a dictionary.
202+
dict: The parsed JSON dict of the schema.
204203
205204
Raises:
206205
requests.RequestException: If the request to fetch the schema fails.
206+
Exception: For any other unexpected errors.
207207
"""
208-
response = requests.get(uri)
209-
response.raise_for_status()
210-
return response.json()
208+
return Resource.from_contents(
209+
fetch_schema_with_override(uri, schema_map=schema_map)
210+
)
211211

212212

213-
@to_cached_resource() # type: ignore
214-
def cached_retrieve(uri: URI) -> str:
213+
def fetch_schema_with_override(
214+
schema_path: str, schema_map: Optional[Dict] = None
215+
) -> Dict:
215216
"""
216217
Retrieve and cache a remote schema.
217218
218219
Args:
219-
uri (str): The URI of the schema.
220+
schema_path (str): Path or URI of the schema.
221+
schema_map (dict): Override schema location to validate against local versions of a schema
220222
221223
Returns:
222-
str: The raw JSON string of the schema.
223-
224-
Raises:
225-
requests.RequestException: If the request to fetch the schema fails.
226-
Exception: For any other unexpected errors.
224+
dict: The parsed JSON dict of the schema.
227225
"""
228-
try:
229-
response = requests.get(uri, timeout=10) # Set a timeout for robustness
230-
response.raise_for_status() # Raise an error for HTTP response codes >= 400
231-
return response.text
232-
except requests.exceptions.RequestException as e:
233-
raise requests.RequestException(
234-
f"Failed to fetch schema from {uri}: {str(e)}"
235-
) from e
236-
except Exception as e:
237-
raise Exception(
238-
f"Unexpected error while retrieving schema from {uri}: {str(e)}"
239-
) from e
240-
241-
242-
def validate_with_ref_resolver(schema_path: str, content: dict) -> None:
226+
227+
if schema_map:
228+
if schema_path in schema_map:
229+
schema_path = schema_map[schema_path]
230+
231+
# Load the schema
232+
return fetch_and_parse_schema(schema_path)
233+
234+
235+
def validate_with_ref_resolver(
236+
schema_path: str, content: Dict, schema_map: Optional[Dict] = None
237+
) -> None:
243238
"""
244239
Validate a JSON document against a JSON Schema with dynamic reference resolution.
245240
246241
Args:
247242
schema_path (str): Path or URI of the JSON Schema.
248243
content (dict): JSON content to validate.
244+
schema_map (dict): Override schema location to validate against local versions of a schema
249245
250246
Raises:
251247
jsonschema.exceptions.ValidationError: If validation fails.
252248
requests.RequestException: If fetching a remote schema fails.
253249
FileNotFoundError: If a local schema file is not found.
254250
Exception: If any other error occurs during validation.
255251
"""
256-
# Load the schema
257-
if schema_path.startswith("http"):
258-
schema = fetch_remote_schema(schema_path)
259-
else:
260-
try:
261-
with open(schema_path, "r") as f:
262-
schema = json.load(f)
263-
except FileNotFoundError as e:
264-
raise FileNotFoundError(f"Schema file not found: {schema_path}") from e
265-
252+
schema = fetch_schema_with_override(schema_path, schema_map=schema_map)
266253
# Set up the resource and registry for schema resolution
254+
cached_retrieve_with_schema_map = functools.partial(
255+
cached_retrieve, schema_map=schema_map
256+
)
267257
resource: Resource = Resource(contents=schema, specification=DRAFT202012) # type: ignore
268-
registry: Registry = Registry(retrieve=cached_retrieve).with_resource( # type: ignore
258+
registry: Registry = Registry(retrieve=cached_retrieve_with_schema_map).with_resource( # type: ignore
269259
uri=schema_path, resource=resource
270260
) # type: ignore
271261

272262
# Validate the content against the schema
273-
try:
274-
validator = Draft202012Validator(schema, registry=registry)
275-
validator.validate(content)
276-
except jsonschema.exceptions.ValidationError as e:
277-
raise jsonschema.exceptions.ValidationError(f"{e.message}") from e
278-
except Exception as e:
279-
raise Exception(f"Unexpected error during validation: {str(e)}") from e
263+
validator = Draft202012Validator(schema, registry=registry)
264+
validator.validate(content)

0 commit comments

Comments
 (0)