Writing edits #38

e-marshall · 2025-03-25T05:59:53Z

Updates to background and conclusion sections

scottyhq · 2025-03-25T15:41:37Z

book/_config.yml

@@ -30,7 +30,7 @@ bibtex_bibfiles:
 repository:
  url: https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows
  #url: https://e-marshall.github.io/cloud-open-source-geospatial-datacube-workflows/introduction.html  # Online location of your book
-  branch: main  # Which branch of the repository should be used when creating links (optional)
+  branch: writing_edits  # Which branch of the repository should be used when creating links (optional)



A couple nit-picky suggestions to deal with jupyter book warnings coming from pixi run build

These are easy, just need to be sure to go from # to ## etc
/tmp/cloud-open-source-geospatial-datacube-workflows/book/background/context_motivation.md:5: WARNING: Non-consecutive header level increase; H1 to H3 [myst.header]

WARNING: skipping unknown output mime type: application/vnd.holoviews_load.v0+json [mystnb.unknown_mime_type]

You can add a bit to this _config.yml to suppress these:

sphinx: config: # application/vnd.holoviews_load.v0+json, application/vnd.holoviews_exec.v0+json suppress_warnings: ["mystnb.unknown_mime_type"]

/tmp/cloud-open-source-geospatial-datacube-workflows/book/intro/open_source_setting.md:17: WARNING: 'image': Unknown option keys: ['right-align'] (allowed: ['align', 'alt', 'class', 'height', 'loading', 'name', 'scale', 'target', 'width']) [myst.directive_option]

not sure!

thanks @scottyhq ! i didn't get these running jupyter-book build, i'll switch over to using pixi run build and troubleshoot

In theory it should be the same (https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows/blob/dc5defdbc070d1f7410260d60172ce1fb8d793c9/pixi.toml#L48) perhaps there is a difference in versions.

FYI I'm running the notebooks on https://hub.cryointhecloud.com . The one additional step required to get the notebooks to find the right environment was to run (name can be whatever you want):

pixi shell python -m ipykernel install --user --name=datacube

oo ok, thanks thats really helpful. do you think it would be worth adding a note about this/running on jupyter hubs in the software page ? i'm happy to do that later this evening if so

Yes, I think it would be great to have as an example of running code next to the data in AWS us-west-2. Of course not everyone has access to cryocloud and who knows how long each jupyterhub lasts, but there are at least a few that one could get access to:
https://docs.openveda.cloud/user-guide/scientific-computing/
https://opensarlab-docs.asf.alaska.edu

python -m ipykernel install --user --name=datacube

I realized this doesn't quite work because it doesn't "activate" the environment (e.g. environment variables like PROJ_DATA don't get set in the same way they do with conda activate or pixi shell)! I'll write up a quick guide in a separate PR: #41

scottyhq · 2025-03-25T15:52:58Z

book/background/imgs/cube.png

This is great! You could also consider linking to it on the front page of the tutorial website (I think it's nice to have a quick visual for visitors to gauge what the website covers).

I like the idea of adding an image to the front page!

e-marshall · 2025-03-25T21:33:34Z

book/intro/3_open_source_setting.md

Is this section necessary / beyond scope? Felt like there was some justification for open source needed but maybe not. i'd be ok either way between leaving this in and polishing it a bit or taking out

To keep it concise, I think you could remove the first paragraph. For refs this is one that I personally found inspirational when it came out https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2015EA000136

that sounds great to me, thanks for the pointer to the paper! I remember reading it a while back and liking a lot, I think it got lost in the rewrite so i'll definitely add it back in and use in this section

scottyhq · 2025-03-26T17:35:53Z

book/sentinel1/nbs/1_read_asf_data.ipynb

I'm trying this with the timeseries_type = "subset" . Seems like the data directory subfolders need to be created in the create_vrt_object() function (e.g. pathlib.Path('../data/raster_data/subset_timeseries/txt_files').mkdir(parents=True, exist_ok=True))

Should 'subset' be the default? instead of 'full' ?

shoot i'll take a look at the subdirs issue
For the default type, i had it set to 'full' because the markdown text in the notebooks is from the perspective of working with the full time series, but do you think we should change it to subset since that's the most accessible version? it isn't an issue in nb 1 but it would require some changes to nb3

this should be addressed on b44b91f

For the default type, i had it set to 'full' because the markdown text in the notebooks is from the perspective of working with the full time series,

OK, makes sense. Perhaps just add a note that download speeds from zenodo can be highly variable (xref https://discourse.pangeo.io/t/downloading-speed-from-zenodo/3529). For me on cryocloud running the cell to download the full zip is taking 1hr+ !

Some quick datapoints:

Home wifi in Germany (probably actually faster than hubs b/c Zenodo/CERN is based in Switzerland (https://about.zenodo.org/infrastructure)

asf_rtcs_full.zip 0%[ ] 310.27M 6.33MB/s eta 2h 8m

CryoCloud (1st download finished in 1 hr, testing again a bit later is super slow)

asf_rtcs_full.zip 0%[ ] 13.03M 677KB/s eta 22h 9m

thanks i will add a warning. it takes ~ 1 hr for me to download to big time series

…rHub (#41) * jupyterhub instructions * wording change --------- Co-authored-by: e-marshall <[email protected]>

fixing files that were renamed

scottyhq · 2025-03-27T15:17:52Z

book/itslive/nbs/4_exploratory_data_analysis_single.ipynb

Just a note that I needed at least 17GB of RAM to successfully run this notebook.

I haven't dug into why, but for some reason opening the zarr via dask is doubling the memory 3.3 -> 6.6GB in the initial loading step... but since it's only a 3GB memory and you immediate do compute() an alternative is to just remove dask entirely from this notebook (chunks=None on load):

# Read raster single_glacier_raster = xr.open_zarr("../data/raster_data/single_glacier_itslive.zarr", decode_coords="all", chunks=None, )

working on reducing memory usage here. i think its a few of the plots causing the issue. considering saving the plots to file and inserting them as fig so that the objects can be deleted in memory... will add in separate pr

scottyhq · 2025-03-27T15:23:15Z

book/conclusion/datacubes_revisited.md

I really like this section :)

scottyhq · 2025-03-27T16:02:14Z

book/itslive/nbs/5_exploratory_data_analysis_group.ipynb

I haven't been able to execute this notebook w/ 30GB RAM CryoCloud instance. Seems to be crashing here:

vector_data_cube = dc_resamp.xvec.zonal_stats( rgi_subset.geometry, x_coords="x", y_coords="y", ).drop_vars("index")

hm, i'm not sure what's going on. on my computer it is ~1.5 GB. will do some digging...

ok i think i know what may have happened. creating the vector cube takes a ton of memory, i ran it once then wrote it to disk to hopefully just use that one that's read from disk (its ~ 4mb) rather than the one made in the nb, but i accidentally added it to the gitignore file. i think i need to figure out a way for it to be included in the repo but not tracked? will work on this but i think will do it in its own pr so will merge this then address that

In my comment above I was thinking of a different operation, the one failing for you happens before reading the vector data cube from memory, my bad. I'm not sure why the discrepancy but it runs for me on less than 2gb ram, I'll try to figure it out tomorrow

i believe the difference in behavior noted here is related to an env created w/ conda and an env created w/ pixi. when i run this in a conda env, the whole notebook takes up less than 2gb ram. in a pixi env, the resample step in the first section of the notebook dc_resamp = dc.resample(mid_date="3ME").mean() uses > 20GB. there is an xarray version difference btw the two (conda env defaults to xr 2024.10 and pixi 2025.1.2) also mentioned here.

does it make sense that this large a change in behavior could be due to the xr version?

e-marshall added 2 commits March 24, 2025 23:24

bunch of edits

91a0879

add captions

9c10e55

e-marshall mentioned this pull request Mar 25, 2025

Text updates #23

Closed

5 tasks

scottyhq reviewed Mar 25, 2025

View reviewed changes

e-marshall added 3 commits March 25, 2025 10:33

add file numbers back in

8d45426

fix file names and a few links

9f3345d

add why os section in intro

9b63e71

e-marshall commented Mar 25, 2025

View reviewed changes

scottyhq mentioned this pull request Mar 26, 2025

Add instructions for executing tutorial notebooks on CryoCloud JupyterHub #41

Merged

scottyhq reviewed Mar 26, 2025

View reviewed changes

scottyhq and others added 8 commits March 26, 2025 17:14

Add instructions for executing tutorial notebooks on CryoCloud Jupyte…

3ce7888

…rHub (#41) * jupyterhub instructions * wording change --------- Co-authored-by: e-marshall <[email protected]>

fix file names and a few links

d29960d

fixing files that were renamed

spelling and formatting fixes

2e12292

spelling and formatting fixes

f8f180b

remove files from tracking

bced0f2

remove duplicate file?

a8db082

add mkdirs line in s1 nb1 and some formatting changes

b44b91f

few typo fixes and other things + os section in intro

ab00a66

scottyhq reviewed Mar 27, 2025

View reviewed changes

book/conclusion/datacubes_revisited.md Outdated

Copy link

Collaborator

scottyhq Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this section :)

scottyhq reviewed Mar 27, 2025

View reviewed changes

e-marshall added 8 commits March 27, 2025 12:48

updates to datacube revisit and others

60c51b4

edits from jessica

8896d77

clean nbs

be9434e

nit

71bba7d

update gitignore to remove vector data cube

f2dfbd3

undo gitignore change, will do in sep pr

75d7f3c

switch build branch back to main

515bb67

add note about download time

8e1d8fa

e-marshall merged commit 3d889e7 into main Mar 28, 2025

scottyhq mentioned this pull request Mar 31, 2025

Behavior across env builds #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing edits #38

Writing edits #38

e-marshall commented Mar 25, 2025

scottyhq Mar 25, 2025

e-marshall Mar 25, 2025

scottyhq Mar 25, 2025

e-marshall Mar 25, 2025

scottyhq Mar 26, 2025

scottyhq Mar 26, 2025 •

edited

Loading

scottyhq Mar 25, 2025

e-marshall Mar 25, 2025

e-marshall Mar 25, 2025

scottyhq Mar 26, 2025

e-marshall Mar 26, 2025

scottyhq Mar 26, 2025

scottyhq Mar 26, 2025

e-marshall Mar 26, 2025

e-marshall Mar 26, 2025

scottyhq Mar 27, 2025

scottyhq Mar 27, 2025

e-marshall Mar 28, 2025

scottyhq Mar 27, 2025 •

edited

Loading

e-marshall Mar 28, 2025

scottyhq Mar 27, 2025

scottyhq Mar 27, 2025 •

edited

Loading

e-marshall Mar 28, 2025 •

edited

Loading

e-marshall Mar 28, 2025 •

edited

Loading

e-marshall Mar 28, 2025

e-marshall Mar 28, 2025

Writing edits #38

Writing edits #38

Conversation

e-marshall commented Mar 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottyhq Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottyhq Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottyhq Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

e-marshall Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

e-marshall Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottyhq Mar 26, 2025 •

edited

Loading

scottyhq Mar 27, 2025 •

edited

Loading

scottyhq Mar 27, 2025 •

edited

Loading

e-marshall Mar 28, 2025 •

edited

Loading

e-marshall Mar 28, 2025 •

edited

Loading