Conversation
|
Hi @jonhealy1, Will you have time soon to do a code review? |
|
@Gomez324 I will make time this weekend. Can you fix the conflicts? Thanks |
| "gte": None, | ||
| "lte": datetime_search.get("lte") if not USE_DATETIME else None, | ||
| }, | ||
| } |
There was a problem hiding this comment.
This added code complicates the core database logic by tightly coupling it to a specific indexing strategy. Please move this calculation into the IndexSelector (the actual consumer) to keep the core method focused solely on query construction.
| "opensearch-py[async]~=2.8.0", | ||
| "uvicorn~=0.23.0", | ||
| "starlette>=0.35.0,<0.36.0", | ||
| "redis==6.4.0", |
There was a problem hiding this comment.
Redis should not be installed in the core package as most Users probably won't use Redis. It can be installed with pip install stac-fastapi-elasticsearch[redis] or with dev
|
|
||
| if not datetime_search: | ||
| return search, result_metadata | ||
|
|
There was a problem hiding this comment.
See other comment on Elasticsearch version of this code.
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail="Product datetime is required for indexing", | ||
| detail="Product 'start_datetime', 'datetime' and 'end_datetime' is required for indexing", |
There was a problem hiding this comment.
This validation logic violates the STAC specification in two ways:
-
It creates a mandatory requirement for start_datetime and end_datetime, which are optional fields in the spec.
-
It rejects items where datetime is null (but start/end are present), which is explicitly allowed for interval data.
Please refactor this to handle standard STAC items (single datetime) and interval items (null datetime) correctly.
There was a problem hiding this comment.
@jonhealy1 I agree with you. However, if indexes are to be created based on start_datetime, then that field must always be required.
There was a problem hiding this comment.
What if we tie this validation to the existing USE_DATETIME setting?
If USE_DATETIME=true (Default): We allow items that only have a datetime field. In these cases, we can derive the index partition name from the datetime field instead of raising a 400 error.
If USE_DATETIME=false: Then strict enforcement of start_datetime is appropriate.
This ensures we support standard STAC items (point-in-time) without forcing users to reconfigure or reformat their data.
There was a problem hiding this comment.
Good idea. I'll need some more time to implement it, but it is doable.
If USE_DATETIME is true, then datetime is required, and the aliases will work as they do now using only datetime, so the migration tool will not be needed? And if it is false, then start_datetime and end_datetime are required, while datetime becomes optional?
There was a problem hiding this comment.
@Gomez324 Sounds good! Yes, I think migration scripts would not be needed.
| datetime_alias = index_dict.get("datetime") | ||
|
|
||
| if not start_datetime_alias: | ||
| continue |
There was a problem hiding this comment.
This line effectively makes all existing production indexes invisible to the API. Current indexes do not have start_datetime aliases.
-
Where is the migration plan to backfill aliases on historical data?
-
Without a migration, this change breaks backwards compatibility and will return 0 results for existing datasets.
| "elasticsearch[async]~=8.19.1", | ||
| "uvicorn~=0.23.0", | ||
| "starlette>=0.35.0,<0.36.0", | ||
| "redis==6.4.0", |
There was a problem hiding this comment.
Same here - let's not install redis here. It's an optional feature.
|
@Gomez324 In the description for this PR, you state that Index B (12th-17th) lies outside the requested range (10th-16th) and would be skipped. This description implies incorrect behavior. STAC API searches rely on Intersection, not Containment. Since Index B overlaps with the search window, it must be queried; otherwise, valid items from the 12th to the 16th would be hidden from the user. Looking at the code in check_criteria, it appears you are correctly implementing intersection logic (which contradicts your description). Please update the PR description to avoid confusion, as the current example implies the feature is broken. |
|
Hey @jonhealy1 I've fixed the code according to the suggestions, it's ready for a CR. |
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/selection/cache_manager.py
Show resolved
Hide resolved
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/selection/cache_manager.py
Show resolved
Hide resolved
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/inserters.py
Outdated
Show resolved
Hide resolved
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/managers.py
Outdated
Show resolved
Hide resolved
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/inserters.py
Outdated
Show resolved
Hide resolved
stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/search_engine/managers.py
Show resolved
Hide resolved
jonhealy1
left a comment
There was a problem hiding this comment.
Thanks for the updates. Since stac-fastapi-core is a public library, changing the method signatures in core.py is technically a Breaking Change that would force a Major Version bump (v3.0.0), which we want to avoid right now.
Please revert the changes to stac_fastapi/core/core.py.
To keep your optimization, simply handle the pass-through in the Elasticsearch plugin:
In database_logic.py, return (search, datetime_string) to satisfy the existing tuple signature.
In client.py (execute_search), accept the string via the existing datetime_search argument.
This keeps the optimization self-contained in the Elasticsearch package and saves us from versioning headaches in Core.
|
@jonhealy1 I'm not sure if this is what you meant, but I made the changes. Could you take a look |
jonhealy1
left a comment
There was a problem hiding this comment.
Just revert these two changes in core
|
Thanks, and thank you for the solid CR as well. I will let you know once our QA tests it. |
a6508a5 to
0d50357
Compare
9ebaccd to
d3f5cce
Compare
|
@jonhealy1 QA has already tested everything on our end and I got an approve from him |
| # all_aliases = set() | ||
| # for index_info in indices.values(): | ||
| # all_aliases.update(index_info.get("aliases", {}).keys()) | ||
| # assert all(alias in all_aliases for alias in expected_aliases) |
There was a problem hiding this comment.
@Gomez324 Should we delete these commented out tests?
There was a problem hiding this comment.
@jonhealy1 Thanks for catching that, those tests should be uncommented. I’ve already fixed it
There was a problem hiding this comment.
There is one more issue related to a race condition, but I’ll describe it in a separate issue together with a proposed solution (I didn’t want to include it in this MR because it would already be quite large, and this race condition was also present in the previous version).
### v6.10.0 - 2026-01-22 #### Added - Added Helm chart for ES or OS in-cluster deployment [#455](#455) - Added configurable hidden item filtering via HIDE_ITEM_PATH environment variable. [#566](#566) #### Changed - Added `PUT /catalogs/{catalog_id}` endpoint to update existing catalogs. Allows modification of catalog metadata (title, description, etc.) while preserving internal fields like parent_ids and catalog relationships. [#573](#573) - Added catalog poly-hierarchy support with hierarchical catalog endpoints (`GET /catalogs/{catalog_id}/catalogs` and `POST /catalogs/{catalog_id}/catalogs`), enabling unlimited nesting levels and allowing catalogs to belong to multiple parent catalogs simultaneously. Includes cursor-based pagination and performance optimizations. [#573](#573) - Added end_datetime alias for datetime-based indexes with use_datetime=false, so that start_datetime/end_datetime queries select a smaller range of indexes (limiting the end) [#537](#537) **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
…ed indexing strategy. (#595) A fix to my previous MR(#537) related to indices based on datetime / start_datetime / end_datetime. During a rebase, one line was changed. Additionally, I’m adding a fallback and logging for index selection. PR Checklist: - [ ] Code is formatted and linted (run `pre-commit run --all-files`) - [ ] Tests pass (run `make test`) - [ ] Documentation has been updated to reflect changes, if applicable - [ ] Changes are added to the changelog
Related Issue(s):
Description:
Until now, only the datetime field had aliases. This change adds aliases for start_datetime and end_datetime when
USE_DATETIME=false, which enables optimized filtering when searching by these fields. It improves performance because Elasticsearch/OpenSearch can now route queries to the appropriate indices instead of scanning a larger number of them.When
USE_DATETIME=true, the system works as before with datetime-based aliases only.Example with
use_datetime=false:Index A with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-08",
"end_datetime": "items_end_datetime_new-collection_2020-02-16"
}
Index B with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-12",
"end_datetime": "items_end_datetime_new-collection_2020-02-17"
}
Index C with aliases:
{
"start_datetime": "items_start_datetime_new-collection_2020-02-18",
"end_datetime": "items_end_datetime_new-collection_2020-02-20"
}
When a user searches in the range start_datetime/end_datetime = 2020-02-10 / 2020-02-16, Index A and Index B will be queried because both indices overlap with the requested range. Index C will be excluded because it does not intersect with that time window.
Previously, all indices could have been selected, but the new aliases allow the query engine to efficiently identify which indices overlap with the target range and avoid scanning unrelated ones, such as Index C.
To enable this feature, set
USE_DATETIME=falsein your configuration. If you want to keep the previous behavior with datetime aliases, setUSE_DATETIME=true.PR Checklist:
pre-commit run --all-files)make test)