Skip to content

Prototype model customization #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

moradology
Copy link
Collaborator

@moradology moradology commented May 24, 2021

This draft PR is not meant for merger but to provide an example of what would be necessary under the current SqlAlchemy backend to support model customization. This draft is relevant for further investigating a solution to #58 and stac-utils/stac-fastapi-sqlalchemy#26.

At issue is the fact that some properties in STAC Items and Collections are meant to live at the top level of said Items and Collections - full-fledged STAC extensions but also the grey area of custom models which are technically not fully implemented as STAC extensions. Providing top level support of this kind means subclassing the Pydantic models used by stac-fastapi to provide type information for fastapi's automatic documentation generation. Unfortunately, this concession doesn't neatly fit for a few reasons. These limitations are discussed below. First, some example output from the collections endpoint as instantiated by Docker Compose:

{
  "id":"joplin",
  "description":"This imagery was acquired by the NOAA Remote Sensing Division to support NOAA national security and emergency response requirements. In addition, it will be used for ongoing research efforts for testing and developing standards for airborne digital imagery. Individual images have been combined into a larger mosaic and tiled for distribution. The approximate ground sample distance (GSD) for each pixel is 35 cm (1.14 feet).",
  "stac_version":"1.0.0-beta.2",
  "links":[
    {
      "href":"http://localhost:8081/collections/joplin",
      "rel":"self",
      "type":"application/json",
      "title":null,
      "label":null
    },
    {
      "href":"http://localhost:8081/",
      "rel":"parent",
      "type":"application/json",
      "title":null,
      "label":null
    },
    {
      "href":"http://localhost:8081/collections/joplin/items",
      "rel":"item",
      "type":"application/geo+json",
      "title":null,
      "label":null
    },
    {
      "href":"http://localhost:8081/",
      "rel":"root",
      "type":"application/json",
      "title":null,
      "label":null
    }
  ],
  "stac_extensions":[
    
  ],
  "title":null,
  "license":"public-domain",
  "extent":{
    "spatial":{
      "bbox":[
        [
          -94.6911621,
          37.0332547,
          -94.402771,
          37.1077651
        ]
      ]
    },
    "temporal":{
      "interval":[
        [
          "2000-02-01T00:00:00Z",
          "2000-02-12T00:00:00Z"
        ]
      ]
    }
  },
  "keywords":null,
  "providers":null,
  "summaries":null,
  "sci_doi":"default_doi",
  "sci_citation":"default_citation",
  "sci_publications":"publication1, publication2, publication3"
}

Pressing Issues (both of these have been resolved as discussed in the comments below)

Type information determines response fields (outdated)

This issue is avoided with exclude_none as pointed out by @kylebarron. exclude_unset, which I had previously tried appears not to be the answer.

In the above JSON, a host of null values are apparent which are not normally provided and which are not desirable. These are seen here because of a change introduced in router.py to ensure that fields unique to subclasses of Collection and Item are returned in responses. Without first turning pydantic instances into dictionaries, only fields provided on Item and Collection are found in the response. Perhaps a pretty_dict method could be required up the inheritance chain to avoid this ugliness.

Incorrect field names (outdated)

This issue can be avoided with aliases, though the solution might introduce some difficulties in complex cases.

The STAC scientific extension uses ":" instead of "_" to separate "sci" from "doi", "citation", and "publications". As pydantic models are simply python classes, field names can't include ":". To accommodate python, field names containing ":" will need to be systematically rewritten. Dictionaries can avoid this limitation but they do so at the cost of losing valuable type information. A solution to this problem which preserves use of pydantic is not apparent beyond users specifying transformations of field names inside the previously mentioned, speculative pretty_dict method.

@moradology
Copy link
Collaborator Author

I forgot to mention in the writeup that I've also included a JSON serialization optimization (conditional use of OrJSON instead of the standard library JSON module) which might be desirable if the automatic serialization of Pydantic model instances is found wanting (for reasons like the two mentioned above)

@geospatial-jeff
Copy link
Collaborator

A solution to this problem which preserves use of pydantic is not apparent beyond users specifying transformations of field names inside the previously mentioned, speculative pretty_dict method.

Pydantic supports field aliasing which makes it possible to represent the extension namespace by aliasing the doi field to look like sci:doi but its not entirely pretty and has a pretty complex precedence for how these aliases are applied (which gets more complex when you use inheritance).

@moradology
Copy link
Collaborator Author

Thanks @geospatial-jeff, that solves the second of the two issues at least. I'm giving the pydantic docs another run through to make sure I've not missed something which might resolve the nullary fields being output as well.

Updated JSON:

{
  "id":"joplin",
  "description":"This imagery was acquired by the NOAA Remote Sensing Division to support NOAA national security and emergency response requirements. In addition, it will be used for ongoing research efforts for testing and developing standards for airborne digital imagery. Individual images have been combined into a larger mosaic and tiled for distribution. The approximate ground sample distance (GSD) for each pixel is 35 cm (1.14 feet).",
  "stac_version":"1.0.0-beta.2",
  "links":[
    {
      "href":"http://localhost:8081/collections/joplin",
      "rel":"self",
      "type":"application/json",
      "title":null,
      "label:assets":null
    },
    {
      "href":"http://localhost:8081/",
      "rel":"parent",
      "type":"application/json",
      "title":null,
      "label:assets":null
    },
    {
      "href":"http://localhost:8081/collections/joplin/items",
      "rel":"item",
      "type":"application/geo+json",
      "title":null,
      "label:assets":null
    },
    {
      "href":"http://localhost:8081/",
      "rel":"root",
      "type":"application/json",
      "title":null,
      "label:assets":null
    }
  ],
  "stac_extensions":[
    
  ],
  "title":null,
  "license":"public-domain",
  "extent":{
    "spatial":{
      "bbox":[
        [
          -94.6911621,
          37.0332547,
          -94.402771,
          37.1077651
        ]
      ]
    },
    "temporal":{
      "interval":[
        [
          "2000-02-01T00:00:00Z",
          "2000-02-12T00:00:00Z"
        ]
      ]
    }
  },
  "keywords":null,
  "providers":null,
  "summaries":null,
  "sci:doi":"default_doi",
  "sci:citation":"default_citation",
  "sci:publications":"publication1, publication2, publication3"
}

@kylebarron
Copy link
Contributor

kylebarron commented May 25, 2021

might resolve the nullary fields being output as well

You might want exclude_none=True on model.dict https://pydantic-docs.helpmanual.io/usage/exporting_models/#modeldict

I also noticed this on the pgstac PR: #126 (comment)

@moradology
Copy link
Collaborator Author

I had tried exclude_unset to get rid of all the null fields and it worked for some but not others. It appears as though what I wanted was exclude_none. Thanks, @kylebarron, this was the missing piece

{
  "id":"joplin",
  "description":"This imagery was acquired by the NOAA Remote Sensing Division to support NOAA national security and emergency response requirements. In addition, it will be used for ongoing research efforts for testing and developing standards for airborne digital imagery. Individual images have been combined into a larger mosaic and tiled for distribution. The approximate ground sample distance (GSD) for each pixel is 35 cm (1.14 feet).",
  "stac_version":"1.0.0-beta.2",
  "links":[
    {
      "href":"http://localhost:8081/collections/joplin",
      "rel":"self",
      "type":"application/json"
    },
    {
      "href":"http://localhost:8081/",
      "rel":"parent",
      "type":"application/json"
    },
    {
      "href":"http://localhost:8081/collections/joplin/items",
      "rel":"item",
      "type":"application/geo+json"
    },
    {
      "href":"http://localhost:8081/",
      "rel":"root",
      "type":"application/json"
    }
  ],
  "stac_extensions":[
    
  ],
  "license":"public-domain",
  "extent":{
    "spatial":{
      "bbox":[
        [
          -94.6911621,
          37.0332547,
          -94.402771,
          37.1077651
        ]
      ]
    },
    "temporal":{
      "interval":[
        [
          "2000-02-01T00:00:00Z",
          "2000-02-12T00:00:00Z"
        ]
      ]
    }
  },
  "sci:doi":"default_doi",
  "sci:citation":"default_citation",
  "sci:publications":"publication1, publication2, publication3"
}

@kylebarron
Copy link
Contributor

I had tried exclude_unset to get rid of all the null fields and it worked for some but not others

I believe @geospatial-jeff or @vincentsarago ran into issues with exclude_unset as well, but I don't remember the details. Maybe exclude_unset doesn't apply to fields with defaults?

@moradology
Copy link
Collaborator Author

moradology commented May 26, 2021

It's looking like the negatives I ran into aren't actually such a great concern so long as people providing custom models provide appropriate aliasing information and dict gets all the flags it needs. There's also not that much of a footprint for providing said functionality in core.

The remaining difficulty, then, is managing the backend implementation's models so that any custom fields which need to be stored/read are available for the from_orm call. This seems like it might be a non-issue w/ the work being done for pgstac

@moradology
Copy link
Collaborator Author

Closing this in favor of #147

@moradology moradology closed this Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants