-
Notifications
You must be signed in to change notification settings - Fork 3
Writing edits #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing edits #38
Conversation
book/_config.yml
Outdated
@@ -30,7 +30,7 @@ bibtex_bibfiles: | |||
repository: | |||
url: https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows | |||
#url: https://e-marshall.github.io/cloud-open-source-geospatial-datacube-workflows/introduction.html # Online location of your book | |||
branch: main # Which branch of the repository should be used when creating links (optional) | |||
branch: writing_edits # Which branch of the repository should be used when creating links (optional) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nit-picky suggestions to deal with jupyter book warnings coming from pixi run build
-
These are easy, just need to be sure to go from
#
to##
etc
/tmp/cloud-open-source-geospatial-datacube-workflows/book/background/context_motivation.md:5: WARNING: Non-consecutive header level increase; H1 to H3 [myst.header] -
WARNING: skipping unknown output mime type: application/vnd.holoviews_load.v0+json [mystnb.unknown_mime_type]
You can add a bit to this _config.yml to suppress these:
sphinx:
config:
# application/vnd.holoviews_load.v0+json, application/vnd.holoviews_exec.v0+json
suppress_warnings: ["mystnb.unknown_mime_type"]
- /tmp/cloud-open-source-geospatial-datacube-workflows/book/intro/open_source_setting.md:17: WARNING: 'image': Unknown option keys: ['right-align'] (allowed: ['align', 'alt', 'class', 'height', 'loading', 'name', 'scale', 'target', 'width']) [myst.directive_option]
not sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @scottyhq ! i didn't get these running jupyter-book build
, i'll switch over to using pixi run build
and troubleshoot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it should be the same (https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows/blob/dc5defdbc070d1f7410260d60172ce1fb8d793c9/pixi.toml#L48) perhaps there is a difference in versions.
FYI I'm running the notebooks on https://hub.cryointhecloud.com . The one additional step required to get the notebooks to find the right environment was to run (name can be whatever you want):
pixi shell
python -m ipykernel install --user --name=datacube
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo ok, thanks thats really helpful. do you think it would be worth adding a note about this/running on jupyter hubs in the software page ? i'm happy to do that later this evening if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be great to have as an example of running code next to the data in AWS us-west-2. Of course not everyone has access to cryocloud and who knows how long each jupyterhub lasts, but there are at least a few that one could get access to:
https://docs.openveda.cloud/user-guide/scientific-computing/
https://opensarlab-docs.asf.alaska.edu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -m ipykernel install --user --name=datacube
I realized this doesn't quite work because it doesn't "activate" the environment (e.g. environment variables like PROJ_DATA don't get set in the same way they do with conda activate
or pixi shell
)! I'll write up a quick guide in a separate PR: #41
book/background/imgs/cube.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! You could also consider linking to it on the front page of the tutorial website (I think it's nice to have a quick visual for visitors to gauge what the website covers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of adding an image to the front page!
book/intro/3_open_source_setting.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this section necessary / beyond scope? Felt like there was some justification for open source needed but maybe not. i'd be ok either way between leaving this in and polishing it a bit or taking out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep it concise, I think you could remove the first paragraph. For refs this is one that I personally found inspirational when it came out https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015EA000136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds great to me, thanks for the pointer to the paper! I remember reading it a while back and liking a lot, I think it got lost in the rewrite so i'll definitely add it back in and use in this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying this with the timeseries_type = "subset"
. Seems like the data directory subfolders need to be created in the create_vrt_object()
function (e.g. pathlib.Path('../data/raster_data/subset_timeseries/txt_files').mkdir(parents=True, exist_ok=True)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should 'subset' be the default? instead of 'full' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoot i'll take a look at the subdirs issue
For the default type, i had it set to 'full' because the markdown text in the notebooks is from the perspective of working with the full time series, but do you think we should change it to subset since that's the most accessible version? it isn't an issue in nb 1 but it would require some changes to nb3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be addressed on b44b91f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the default type, i had it set to 'full' because the markdown text in the notebooks is from the perspective of working with the full time series,
OK, makes sense. Perhaps just add a note that download speeds from zenodo can be highly variable (xref https://discourse.pangeo.io/t/downloading-speed-from-zenodo/3529). For me on cryocloud running the cell to download the full zip is taking 1hr+ !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick datapoints:
Home wifi in Germany (probably actually faster than hubs b/c Zenodo/CERN is based in Switzerland (https://about.zenodo.org/infrastructure)
asf_rtcs_full.zip 0%[ ] 310.27M 6.33MB/s eta 2h 8m
CryoCloud (1st download finished in 1 hr, testing again a bit later is super slow)
asf_rtcs_full.zip 0%[ ] 13.03M 677KB/s eta 22h 9m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks i will add a warning. it takes ~ 1 hr for me to download to big time series
…rHub (#41) * jupyterhub instructions * wording change --------- Co-authored-by: e-marshall <[email protected]>
fixing files that were renamed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that I needed at least 17GB of RAM to successfully run this notebook.
I haven't dug into why, but for some reason opening the zarr via dask is doubling the memory 3.3 -> 6.6GB in the initial loading step... but since it's only a 3GB memory and you immediate do compute() an alternative is to just remove dask entirely from this notebook (chunks=None on load):
# Read raster
single_glacier_raster = xr.open_zarr("../data/raster_data/single_glacier_itslive.zarr",
decode_coords="all",
chunks=None,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on reducing memory usage here. i think its a few of the plots causing the issue. considering saving the plots to file and inserting them as fig so that the objects can be deleted in memory... will add in separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this section :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to execute this notebook w/ 30GB RAM CryoCloud instance. Seems to be crashing here:
vector_data_cube = dc_resamp.xvec.zonal_stats(
rgi_subset.geometry,
x_coords="x",
y_coords="y",
).drop_vars("index")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i'm not sure what's going on. on my computer it is ~1.5 GB. will do some digging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i think i know what may have happened. creating the vector cube takes a ton of memory, i ran it once then wrote it to disk to hopefully just use that one that's read from disk (its ~ 4mb) rather than the one made in the nb, but i accidentally added it to the gitignore file. i think i need to figure out a way for it to be included in the repo but not tracked? will work on this but i think will do it in its own pr so will merge this then address that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my comment above I was thinking of a different operation, the one failing for you happens before reading the vector data cube from memory, my bad. I'm not sure why the discrepancy but it runs for me on less than 2gb ram, I'll try to figure it out tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe the difference in behavior noted here is related to an env created w/ conda and an env created w/ pixi. when i run this in a conda env, the whole notebook takes up less than 2gb ram. in a pixi env, the resample step in the first section of the notebook dc_resamp = dc.resample(mid_date="3ME").mean()
uses > 20GB. there is an xarray version difference btw the two (conda env defaults to xr 2024.10 and pixi 2025.1.2) also mentioned here.
does it make sense that this large a change in behavior could be due to the xr version?
Updates to background and conclusion sections