-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exactextract zonal stats implementation #236
Exactextract zonal stats implementation #236
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tm-jc-nacpil , requesting the ff updates
- Update the
settings.ini
line starting withrequirements =
to add theexactextract
package as a dependency - Add a test in the
tests/test_raster_zonal_stats.py
file to add a sample code on how to call the new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a markdown cell to separate the new section (which documents an alternative way to collect zonal stats on a raster). You can mention the advantages/cons of using the alternative method as well as mention and link to the underlying library (exactextract).
- Also add the link to
exactextract method
documentation in the docstring so that users that see the docstring when they hover over the method signature can view more details on the possible arguments to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @butchtm! Latest commits resolve the ff.
- Separate section markdown for exactextract
- Link to exactextract in doc string
- Adding exactextract to settings
Tests are still pending 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @butchtm @tm-danna-ang pushed some tests into the PR
- Testing basic usage
- Opening from file
- Multiband opening
- Multiband + crs mismatch
- extra args (output, include_cols, include_geom)
Some things I haven't tested yet
- Passing a weights raster
- Passing output="gdal" (this currently raises a ValueError in our function as it's tricky to support this pattern)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @tm-jc-nacpil! will take a look at this on Mon 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tm-jc-nacpil lgtm! just a comment, you can skip adding a print
statement (see L171
) since pytest
only shows the warnings summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention an additional change for settings.ini
(and geowrangler/__init__.py
): Update the version with
nbdev_bump_version
or manually increment the version in those files (e.g., to 0.3.1
)
and afterwards I'm good with merging!
geowrangler/raster_zonal_stats.py
Outdated
|
||
# If output, include_cols, or include_geom, return the raw exactextract results instead | ||
# These values conflict with the intended postprocessing steps (renaming/filtering, joining to input gdf) | ||
RETURN_RAW_RESULTS = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tm-danna-ang @butchtm would like to get your thoughts on this initial implementation of handling the extra args output
(can be pandas, geojson, or gdal), include_cols
(list of columns to return), and include_geom
(bool). 🤔
For now, when these are passed with non-default values, I resort to returning the raw exact extract output without postprocessing (i.e. filtering to specific columns, renaming based on the agg spec), because the postprocessing doesn't work as intended if these are passed. In this case, the function throws a warning
- For example, if the user passes "geojson" none of the pandas wrangling works
I thought this would be best so that we can accommodate people using custom args while still maintaining the usual usage pattern of using (geo)pandas. Alternatively, we can also ignore these passed args and hardcode the intended defaults if we want it to be pandas only. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that in the original raster zonal stats, it looks like we elected to ignore/override some extra args *i.e. extra_args.pop("<arg to be ignored>")
see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I think we can proceed ignoring the output
arg (thought I think include_cols
and include_geom
are nice to have but not necessary for this first release so up to you!) in line with the geowrangler approach and pass a warning that says the output
arg is ignored and the user should save the file as geojson
via gdal
on their own.
Saving files would also preempt adding name and/or directory for saving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards making include_geom and include_cols as part of the func definition na instead of in the extra_args
-- we can do this on the pandas side nalang din
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tm-danna-ang implemented the new params in latest commit!
Re: NODATA value For this, I was thinking we can just document that it isn't supported yet, then we can revisit once exactextract in pypi is updated again? 🤔 |
As Butch suggested earlier, maybe we can use the existing commit as the dependency instead for now until the pypi version gets updated. Can you try adding the line below to the dependencies in
|
@tm-danna-ang I added the ff line as recommended here, which uv tries to install but it fails as the wheel hasn't been built. 😞
pasting stacktrace here
|
oof 😢 then as a temporary workaround maybe replace |
Notes for post merge tasks:
|
Intro
This PR adds an initial implementation of exactextract zonal stats
create_exactextract_zonal_stats()
functiondata/ph_s5p_AER_AI_340_380.tiff
which is a multiband raster for demos. (Sentinel 5P absorbing aerosol index over PH)Implementation Notes
The current exactextract function is able to do multiband calculations using a similar format to
vector_zonal_stats
. Would like to note some things hereband
parameter.output
column specifies the output column namesCurrent to-dos
"mean(default_value=<NODATA>)"
doesn't work, so I'll have to dig into how it's specified again. 😅 Planning to reach out to the developers again regarding this.