Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exactextract zonal stats implementation #236

Merged

Conversation

tm-jc-nacpil
Copy link
Collaborator

@tm-jc-nacpil tm-jc-nacpil commented Jul 1, 2024

Intro

This PR adds an initial implementation of exactextract zonal stats

  1. Adds create_exactextract_zonal_stats() function
  2. Adds data/ph_s5p_AER_AI_340_380.tiff which is a multiband raster for demos. (Sentinel 5P absorbing aerosol index over PH)

Implementation Notes

The current exactextract function is able to do multiband calculations using a similar format to vector_zonal_stats. Would like to note some things here

  1. The data to be aggregated should be specified as band parameter.
  • If the passed band number is outside the valid range based on the band count, I elected to skip it for now and print a warning. This is different from vector zonal stats which applies default data columns (index), but I was thinking this is safer so that the user is aware that the passed band is wrong
  1. The output column specifies the output column names

Current to-dos

  • Figure out NODATA value -- it turns out that "mean(default_value=<NODATA>)" doesn't work, so I'll have to dig into how it's specified again. 😅 Planning to reach out to the developers again regarding this.
  • Implement parsing of extraargs into the exactextract function
  • Write tests

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@butchtm butchtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tm-jc-nacpil , requesting the ff updates

  • Update the settings.ini line starting with requirements = to add the exactextract package as a dependency
  • Add a test in the tests/test_raster_zonal_stats.py file to add a sample code on how to call the new method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add a markdown cell to separate the new section (which documents an alternative way to collect zonal stats on a raster). You can mention the advantages/cons of using the alternative method as well as mention and link to the underlying library (exactextract).
  • Also add the link to exactextract method documentation in the docstring so that users that see the docstring when they hover over the method signature can view more details on the possible arguments to the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @butchtm! Latest commits resolve the ff.

  1. Separate section markdown for exactextract
  2. Link to exactextract in doc string
  3. Adding exactextract to settings

Tests are still pending 🙏

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @butchtm @tm-danna-ang pushed some tests into the PR

  1. Testing basic usage
  2. Opening from file
  3. Multiband opening
  4. Multiband + crs mismatch
  5. extra args (output, include_cols, include_geom)

Some things I haven't tested yet

  1. Passing a weights raster
  2. Passing output="gdal" (this currently raises a ValueError in our function as it's tricky to support this pattern)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @tm-jc-nacpil! will take a look at this on Mon 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tm-jc-nacpil lgtm! just a comment, you can skip adding a print statement (see L171) since pytest only shows the warnings summary

Copy link
Collaborator

@tm-danna-ang tm-danna-ang Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention an additional change for settings.ini (and geowrangler/__init__.py): Update the version with

nbdev_bump_version

or manually increment the version in those files (e.g., to 0.3.1)

and afterwards I'm good with merging!


# If output, include_cols, or include_geom, return the raw exactextract results instead
# These values conflict with the intended postprocessing steps (renaming/filtering, joining to input gdf)
RETURN_RAW_RESULTS = False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tm-danna-ang @butchtm would like to get your thoughts on this initial implementation of handling the extra args output (can be pandas, geojson, or gdal), include_cols (list of columns to return), and include_geom (bool). 🤔

For now, when these are passed with non-default values, I resort to returning the raw exact extract output without postprocessing (i.e. filtering to specific columns, renaming based on the agg spec), because the postprocessing doesn't work as intended if these are passed. In this case, the function throws a warning

  • For example, if the user passes "geojson" none of the pandas wrangling works

I thought this would be best so that we can accommodate people using custom args while still maintaining the usual usage pattern of using (geo)pandas. Alternatively, we can also ignore these passed args and hardcode the intended defaults if we want it to be pandas only. wdyt?

Copy link
Collaborator Author

@tm-jc-nacpil tm-jc-nacpil Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that in the original raster zonal stats, it looks like we elected to ignore/override some extra args *i.e. extra_args.pop("<arg to be ignored>") see here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I think we can proceed ignoring the output arg (thought I think include_cols and include_geom are nice to have but not necessary for this first release so up to you!) in line with the geowrangler approach and pass a warning that says the output arg is ignored and the user should save the file as geojson via gdal on their own.

Saving files would also preempt adding name and/or directory for saving.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards making include_geom and include_cols as part of the func definition na instead of in the extra_args -- we can do this on the pandas side nalang din

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tm-danna-ang implemented the new params in latest commit!

@tm-jc-nacpil
Copy link
Collaborator Author

Re: NODATA value
Just realized out that the implementation for nodata / default value handling is not yet in the latest pypi release 😭 See: isciences/exactextract#91 (dated May) but the latest release was in March

For this, I was thinking we can just document that it isn't supported yet, then we can revisit once exactextract in pypi is updated again? 🤔

@tm-danna-ang
Copy link
Collaborator

tm-danna-ang commented Jul 4, 2024

Re: NODATA value Just realized out that the implementation for nodata / default value handling is not yet in the latest pypi release 😭 See: isciences/exactextract#91 (dated May) but the latest release was in March

For this, I was thinking we can just document that it isn't supported yet, then we can revisit once exactextract in pypi is updated again? 🤔

As Butch suggested earlier, maybe we can use the existing commit as the dependency instead for now until the pypi version gets updated. Can you try adding the line below to the dependencies in settings.ini? Assuming that is the right commit. Using # instead of @ may or may not work as well.

git+ssh://[email protected]/isciences/exactextract.git@364acf3ce579c3b51b78e70642460816b4da9ddc

@tm-jc-nacpil
Copy link
Collaborator Author

tm-jc-nacpil commented Jul 4, 2024

As Butch suggested earlier, maybe we can use the existing commit as the dependency instead for now until the pypi version gets updated. Can you try adding the line below to the dependencies in settings.ini? Assuming that is the right commit. Using # instead of @ may or may not work as well.

git+ssh://[email protected]/isciences/exactextract.git@364acf3ce579c3b51b78e70642460816b4da9ddc

@tm-danna-ang I added the ff line as recommended here, which uv tries to install but it fails as the wheel hasn't been built. 😞

"exactextract @ git+ssh://[email protected]/isciences/exactextract.git@364acf3ce579c3b51b78e70642460816b4da9ddc"

pasting stacktrace here

error: Failed to prepare distributions
  Caused by: Failed to fetch wheel: exactextract @ git+ssh://[email protected]/isciences/exactextract.git@364acf3ce579c3b51b78e70642460816b4da9ddc
  Caused by: Failed to build: `exactextract @ git+ssh://[email protected]/isciences/exactextract.git@364acf3ce579c3b51b78e70642460816b4da9ddc`
  Caused by: Build backend failed to build wheel through `build_wheel()` with exit status: 1
--- stdout:
*** scikit-build-core 0.9.8 using CMake 3.16.3 (wheel)
*** Configuring CMake...
loading initial cache file /tmp/tmp5fz0yh5u/build/CMakeInit.txt
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc
-- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++
-- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring incomplete, errors occurred!
See also "/tmp/tmp5fz0yh5u/build/CMakeFiles/CMakeOutput.log".
--- stderr:
CMake Error at CMakeLists.txt:19 (find_package):
  By not providing "FindGEOS.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "GEOS", but
  CMake did not find one.

  Could not find a package configuration file provided by "GEOS" (requested
  version 3.5) with any of the following names:

    GEOSConfig.cmake
    geos-config.cmake

  Add the installation prefix of "GEOS" to CMAKE_PREFIX_PATH or set
  "GEOS_DIR" to a directory containing one of the above files.  If "GEOS"
  provides a separate development package or SDK, be sure it has been
  installed.



*** CMake configuration failed
---

@tm-danna-ang
Copy link
Collaborator

@tm-danna-ang I added the ff line as recommended here, which uv tries to install but it fails as the wheel hasn't been built. 😞

oof 😢 then as a temporary workaround maybe replace nodata input (via extra_args) as np.nan prior to the exact_extract() call then revisit once the pypi version is updated

@tm-jc-nacpil tm-jc-nacpil marked this pull request as ready for review July 10, 2024 06:52
geowrangler/raster_zonal_stats.py Show resolved Hide resolved
@butchtm butchtm merged commit 93c8e12 into thinkingmachines:master Jul 12, 2024
1 check passed
@butchtm
Copy link
Collaborator

butchtm commented Jul 12, 2024

Notes for post merge tasks:

  • Add examples for create_exactextract_zonal_stats in raster_zonal_stats tutorials.
  • Implement NO_DATA handling once exactextract package incorporates it in their released version

@butchtm butchtm changed the title [WIP] Exactextract zonal stats implementation Exactextract zonal stats implementation Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants