Skip to content

Commit d617cba

Browse files
authored
Merge pull request stac-utils#50 from azavea/rde/feature/identify-version
Add functionality for identifying STAC JSON information
2 parents a34e161 + 89d8d06 commit d617cba

File tree

77 files changed

+228771
-114
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+228771
-114
lines changed

docs/api.rst

+45
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,51 @@ SingleFileSTAC
229229
:members:
230230
:undoc-members:
231231

232+
Serialization
233+
-------------
234+
235+
PySTAC includes a ``pystac.serialization`` package for serialization concerns that
236+
are used internally, but may also be useful to external tools.
237+
238+
merge_common_properties
239+
~~~~~~~~~~~~~~~~~~~~~~~
240+
241+
.. automodule:: pystac.serialization
242+
:members: merge_common_properties
243+
244+
indentify_stac_object
245+
~~~~~~~~~~~~~~~~~~~~~
246+
247+
.. automodule:: pystac.serialization
248+
:members: identify_stac_object
249+
250+
indentify_stac_object_type
251+
~~~~~~~~~~~~~~~~~~~~~~~~~~
252+
253+
.. automodule:: pystac.serialization
254+
:members: identify_stac_object_type
255+
256+
257+
STACJSONDescription
258+
~~~~~~~~~~~~~~~~~~~
259+
260+
.. autoclass:: pystac.serialization.STACJSONDescription
261+
:members:
262+
:undoc-members:
263+
264+
STACVersionRange
265+
~~~~~~~~~~~~~~~~
266+
267+
.. autoclass:: pystac.serialization.STACVersionRange
268+
:members:
269+
:undoc-members:
270+
271+
STACObjectType
272+
~~~~~~~~~~~~~~
273+
274+
.. autoclass:: pystac.serialization.STACObjectType
275+
:members:
276+
:undoc-members:
232277

233278
PySTAC Internal Classes
234279
-----------------------

docs/concepts.rst

+44
Original file line numberDiff line numberDiff line change
@@ -311,3 +311,47 @@ Resolution Caching
311311
The root :class:`~pystac.Catalog` instance of a STAC (the Catalog which is linked to by every associated object's ``root`` link) contains a cache of resolved objects. This cache points to in-memory instances of :class:`~pystac.STACObject` s that have already been resolved through PySTAC crawling links associated with that root catalog. The cache works off of the stac object's ID, which is why **it is necessary for every STAC object in the catalog to have a unique identifier, which is unique across the entire STAC**.
312312

313313
When a link is being resolved from a STACObject that has it's root set, that root is passed into the :func:`Link.resolve_stac_object <pystac.Link.resolve_stac_object>` call. That root's :class:`~pystac.resolved_object_cache.ResolvedObjectCache` will be used to ensure that if the link is pointing to an object that has already been resolved, then that link will point to the same, single instance in the cache. This ensures working with STAC objects in memory doesn't create a situation where multiple copies of the same STAC objects are created from different links, manipulated, and written over each other.
314+
315+
Working with STAC JSON
316+
======================
317+
318+
The ``pystac.serialization`` package has some functionality around working directly with STAC
319+
JSON objects, without utilizing PySTAC object types. This is used internally by PySTAC, but might also be useful to users working directly with JSON (e.g. on validation).
320+
321+
322+
Identifing STAC objects from JSON
323+
---------------------------------
324+
325+
Users can identify STAC information, including the object type, version and extensions,
326+
from JSON. The main method for this is :func:`~pystac.serialization.identify_stac_object`,
327+
which returns an object that contains the object type, the range of versions this object is
328+
valid for (according to PySTAC's best guess), the common extensions implemented by this object,
329+
and any custom extensions (represented by URIs to JSON Schemas).
330+
331+
.. code-block:: python
332+
333+
from pystac.serialization import identify_stac_object
334+
335+
json_dict = ...
336+
337+
info = identify_stac_object(json_dict, merge_collection_properties=True)
338+
339+
# The object type
340+
info.object_type
341+
342+
# The version range
343+
info.version_range
344+
345+
# The common extensions
346+
info.common_extensions
347+
348+
# The custom Extensions
349+
info.custom_extensions
350+
351+
Merging common properties
352+
-------------------------
353+
354+
The :func:`~pystac.serialization.merge_common_properties` will take a JSON dict that represents
355+
an item, and if it is associated with a collection, merge in the collection's properties.
356+
You can pass in a dict that contains previously read collections that caches collections by the HREF of the collection link and/or the collection ID, which can help avoid multiple reads of
357+
collection links.

pystac/__init__.py

+19-21
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ class STACError(Exception):
2727
from pystac.eo import *
2828
from pystac.label import *
2929

30+
from pystac.serialization import (identify_stac_object, STACObjectType)
31+
3032

3133
def _stac_object_from_dict(d, href=None, root=None):
3234
"""Determines how to deserialize a dictionary into a STAC object.
@@ -42,28 +44,24 @@ def _stac_object_from_dict(d, href=None, root=None):
4244
Note: This is used internally in STAC_IO to deserialize STAC Objects.
4345
It is in the top level __init__ in order to avoid circular dependencies.
4446
"""
45-
extensions = d.get('stac_extensions', [])
46-
if 'type' in d:
47-
if d['type'] == 'FeatureCollection':
48-
# Dealing with an Item Collection
49-
if 'collections' in d:
50-
return SingleFileSTAC.from_dict(d, href=href, root=root)
51-
else:
52-
return ItemCollection.from_dict(d, href=href, root=root)
53-
else:
54-
# Dealing with an Item
55-
if 'eo' in extensions or \
56-
any([k for k in d['properties'].keys() if k.startswith('eo:')]):
57-
return EOItem.from_dict(d, href=href, root=root)
58-
elif 'label' in extensions or \
59-
any([k for k in d['properties'].keys() if k.startswith('label:')]):
60-
return LabelItem.from_dict(d, href=href, root=root)
61-
else:
62-
return Item.from_dict(d, href=href, root=root)
63-
elif 'extent' in d:
64-
return Collection.from_dict(d, href=href, root=root)
65-
else:
47+
info = identify_stac_object(d)
48+
49+
# TODO: Transorm older versions to newest version (pystac.serialization.migrate)
50+
51+
if info.object_type == STACObjectType.CATALOG:
6652
return Catalog.from_dict(d, href=href, root=root)
53+
if info.object_type == STACObjectType.COLLECTION:
54+
return Collection.from_dict(d, href=href, root=root)
55+
if info.object_type == STACObjectType.ITEMCOLLECTION:
56+
if 'single-file-stac' in info.common_extensions:
57+
return SingleFileSTAC.from_dict(d, href=href, root=root)
58+
return ItemCollection.from_dict(d, href=href, root=root)
59+
if info.object_type == STACObjectType.ITEM:
60+
if 'eo' in info.common_extensions:
61+
return EOItem.from_dict(d, href=href, root=root)
62+
if 'label' in info.common_extensions:
63+
return LabelItem.from_dict(d, href=href, root=root)
64+
return Item.from_dict(d, href=href, root=root)
6765

6866

6967
STAC_IO.stac_object_from_dict = _stac_object_from_dict

pystac/serialization/__init__.py

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# flake8: noqa
2+
3+
from pystac.serialization.identify import (STACObjectType, STACJSONDescription,
4+
STACVersionRange,
5+
identify_stac_object,
6+
identify_stac_object_type)
7+
8+
from pystac.serialization.common_properties import merge_common_properties
+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from pystac.utils import make_absolute_href
2+
from pystac.stac_io import STAC_IO
3+
4+
5+
def merge_common_properties(item_dict, collection_cache=None, json_href=None):
6+
"""Merges Collection properties into an Item.
7+
8+
Args:
9+
item_dict: JSON dict of the Item which properties should be merged
10+
into.
11+
collection_cache: Optional cache of Collection JSON that has previously
12+
read. Keyed to either the Collection ID or an HREF.
13+
json_href: The HREF of the file that this JSON comes from. Used
14+
to resolve relative paths.
15+
16+
Returns:
17+
bool: True if Collection properties have been merged, otherwise False.
18+
"""
19+
properties_merged = False
20+
collection = None
21+
collection_href = None
22+
23+
# Try the cache if we have a collection ID.
24+
if 'collection' in item_dict:
25+
collection_id = item_dict['collection']
26+
if collection_cache is not None:
27+
collection = collection_cache.get(collection_id)
28+
29+
# Next, try the collection link.
30+
if collection is None:
31+
links = item_dict['links']
32+
collection_link = next((l for l in links if l['rel'] == 'collection'),
33+
None)
34+
if collection_link is not None:
35+
collection_href = collection_link['href']
36+
if json_href is not None:
37+
collection_href = make_absolute_href(collection_href,
38+
json_href)
39+
if collection_cache is not None:
40+
collection = collection_cache.get(collection_href)
41+
42+
if collection is None:
43+
collection = STAC_IO.read_json(collection_href)
44+
45+
if collection is not None:
46+
if 'properties' in collection:
47+
for k in collection['properties']:
48+
if k not in item_dict['properties']:
49+
properties_merged = True
50+
item_dict['properties'][k] = collection['properties'][k]
51+
52+
if collection_cache is not None and collection[
53+
'id'] not in collection_cache:
54+
collection_id = collection['id']
55+
collection_cache[collection_id] = collection
56+
if collection_href is not None:
57+
collection_cache[collection_href] = collection
58+
59+
return properties_merged

0 commit comments

Comments
 (0)