Skip to content

Update datetime filter #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Changed

- Improved datetime query handling to only check start and end datetime values when datetime is None [#396](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/396)
- Optimize data_loader.py script [#395](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/395)
- Refactored test configuration to use shared app config pattern [#399](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/399)

Expand Down
186 changes: 81 additions & 105 deletions stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,121 +245,97 @@ def apply_collections_filter(search: Search, collection_ids: List[str]):
@staticmethod
def apply_datetime_filter(
search: Search, interval: Optional[Union[DateTimeType, str]]
):
) -> Search:
"""Apply a filter to search on datetime, start_datetime, and end_datetime fields.

Args:
search (Search): The search object to filter.
interval: Optional[Union[DateTimeType, str]]
search: The search object to filter.
interval: Optional datetime interval to filter by. Can be:
- A single datetime string (e.g., "2023-01-01T12:00:00")
- A datetime range string (e.g., "2023-01-01/2023-12-31")
- A datetime object
- A tuple of (start_datetime, end_datetime)

Returns:
Search: The filtered search object.
The filtered search object.
"""
if not interval:
return search

should = []
datetime_search = return_date(interval)
try:
datetime_search = return_date(interval)
except (ValueError, TypeError) as e:
# Handle invalid interval formats if return_date fails
logger.error(f"Invalid interval format: {interval}, error: {e}")
return search

# If the request is a single datetime return
# items with datetimes equal to the requested datetime OR
# the requested datetime is between their start and end datetimes
if "eq" in datetime_search:
should.extend(
[
Q(
"bool",
filter=[
Q(
"term",
properties__datetime=datetime_search["eq"],
),
],
),
Q(
"bool",
filter=[
Q(
"range",
properties__start_datetime={
"lte": datetime_search["eq"],
},
),
Q(
"range",
properties__end_datetime={
"gte": datetime_search["eq"],
},
),
],
),
]
)

# If the request is a date range return
# items with datetimes within the requested date range OR
# their startdatetime ithin the requested date range OR
# their enddatetime ithin the requested date range OR
# the requested daterange within their start and end datetimes
# For exact matches, include:
# 1. Items with matching exact datetime
# 2. Items with datetime:null where the time falls within their range
should = [
Q(
"bool",
filter=[
Q("exists", field="properties.datetime"),
Q("term", **{"properties__datetime": datetime_search["eq"]}),
],
),
Q(
"bool",
must_not=[Q("exists", field="properties.datetime")],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current best practices recommends that you populate datetime even if you have a date range.

"The specification does allow one to set the datetime field to null, but it is strongly recommended to populate the single datetime field, as that is what many clients will search on. If it is at all possible to pick a nominal or representative datetime then that should be used."

So we should probably loosen the search (remove this line?) or update the recommended practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I think the best practices are recommended but may not always be relevant for all types of data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhysrevans3 Can you look at this issue #396? I am not 100% sure on what the right approach should be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for some items, defined by start and end datetimes, it may not make sense to set a datetime value just for the sake of doing so. The stac spec itself allows null datetime values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right both the case for a null or set datetime when start and end dates are used needs to be handled. What's the expected behaviour when the datetime is set? I would expect it to search on all date fields. If that's the case then I think the must_not=[Q("exists", field="properties.datetime")] can be removed. I think the current query will ignore start and end dates if the datetime is set. Is that expected?

Copy link
Collaborator Author

@jonhealy1 jonhealy1 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the datetime is set it only searches the datetime and not the start and end datetimes. That's why we need the must not. If we want to search both datetime and start and end datetimes, which is the current functionality, we could set an env var to allow this?

filter=[
Q("exists", field="properties.start_datetime"),
Q("exists", field="properties.end_datetime"),
Q(
"range",
properties__start_datetime={"lte": datetime_search["eq"]},
),
Q(
"range",
properties__end_datetime={"gte": datetime_search["eq"]},
),
],
),
]
else:
should.extend(
[
Q(
"bool",
filter=[
Q(
"range",
properties__datetime={
"gte": datetime_search["gte"],
"lte": datetime_search["lte"],
},
),
],
),
Q(
"bool",
filter=[
Q(
"range",
properties__start_datetime={
"gte": datetime_search["gte"],
"lte": datetime_search["lte"],
},
),
],
),
Q(
"bool",
filter=[
Q(
"range",
properties__end_datetime={
"gte": datetime_search["gte"],
"lte": datetime_search["lte"],
},
),
],
),
Q(
"bool",
filter=[
Q(
"range",
properties__start_datetime={
"lte": datetime_search["gte"]
},
),
Q(
"range",
properties__end_datetime={
"gte": datetime_search["lte"]
},
),
],
),
]
)

search = search.query(Q("bool", filter=[Q("bool", should=should)]))

return search
# For date ranges, include:
# 1. Items with datetime in the range
# 2. Items with datetime:null that overlap the search range
should = [
Q(
"bool",
filter=[
Q("exists", field="properties.datetime"),
Q(
"range",
properties__datetime={
"gte": datetime_search["gte"],
"lte": datetime_search["lte"],
},
),
],
),
Q(
"bool",
must_not=[Q("exists", field="properties.datetime")],
Copy link
Collaborator

@rhysrevans3 rhysrevans3 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below.

filter=[
Q("exists", field="properties.start_datetime"),
Q("exists", field="properties.end_datetime"),
Q(
"range",
properties__start_datetime={"lte": datetime_search["lte"]},
),
Q(
"range",
properties__end_datetime={"gte": datetime_search["gte"]},
),
],
Comment on lines +323 to +334
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough to give all possible combinations of datetime overlap? This looks like it will only return items whose date range entirely encapsulates the searched for date range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case shows overlap.

# Test 5: Range matching null-datetime-item but not range-item's datetime
    feature_ids = await _search_and_get_ids(
        app_client,
        params={
            "datetime": "2020-01-01T12:00:00Z/2020-01-02T12:00:00Z",
            "collections": [collection_id],
        },
    )
    assert feature_ids == {
        "null-datetime-item",  # Overlaps: search range [12:00-01-01 to 12:00-02-01] overlaps item range [00:00-01-01 to 00:00-02-01]
    }, "Range search excluding range-item datetime failed"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the test correctly the null-datetime-item items has a date range of 2020-01-01T00:00:00Z to 2020-01-02T00:00:00Z and the search is 2020-01-01T12:00:00Z/2020-01-02T12:00:00Z so the searched date range it entirely within the item's date range. If the searched for range's end date was extended by a day so was 2020-01-01T12:00:00Z/2020-01-03T12:00:00Z I suspect the item wouldn't be returned. Is that the desired behaviour? I thought that if there was any overlap then the item should be returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mixed up, but the item has a 24hr. date range and so does the query so they do overlap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end date of the search is outside the items end date

),
]

return search.query(Q("bool", should=should, minimum_should_match=1))

@staticmethod
def apply_bbox_filter(search: Search, bbox: List):
Expand Down
Loading