Skip to content

Commit 3d889e7

Browse files
e-marshallscottyhq
andauthored
Writing edits (#38)
* bunch of edits * add captions * add file numbers back in * fix file names and a few links * add why os section in intro * Add instructions for executing tutorial notebooks on CryoCloud JupyterHub (#41) * jupyterhub instructions * wording change --------- Co-authored-by: e-marshall <[email protected]> * fix file names and a few links fixing files that were renamed * spelling and formatting fixes * spelling and formatting fixes * remove files from tracking * add mkdirs line in s1 nb1 and some formatting changes * few typo fixes and other things + os section in intro * updates to datacube revisit and others * edits from jessica * clean nbs * nit * update gitignore to remove vector data cube * undo gitignore change, will do in sep pr * switch build branch back to main * add note about download time --------- Co-authored-by: Scott Henderson <[email protected]>
1 parent 86cd6eb commit 3d889e7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+976
-4679
lines changed

.codespellignore

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
"Xarray",
2+
"geospatial",
3+
"CEOS",
4+
"backscatter",
5+
"Dask",
6+
"Zarr",
7+
"geoscience",
8+
"STAC", "stackstac", "PySTAC", "Zenodo", "USGS","SERVIR","NSIDC","shapefile","pixi","itslive",
9+
"jovyan","kernelspec","regridding","Pangeo","Xvec"
10+
"basemaps","matplotlib","fontsize","skipna","linestyle","GDAL"

.gitignore

+10-1
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,17 @@
22
_build
33
_static
44

5+
#scratch
6+
all_links.txt
7+
node_modules/
8+
package-lock.json
9+
package.json
10+
utils/get_links.py
11+
512
#Data
6-
**/data/raster_data/*
13+
sentinel1/data/raster_data/*
14+
itslive/data/raster_data/*
15+
#itslive/data/raster_data/single_glacier_itslive.zarr/*
716

817
#Extra nbs
918
sentinel1/subste_nbs

.pre-commit-config.yaml

+9-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ repos:
1212
rev: 1.9.1 # Check latest version
1313
hooks:
1414
- id: nbqa-flake8
15-
args: ["--max-line-length=120"]
15+
args: ["--max-line-length=88","--ignore=E402"] #ignore module import not at top of file
1616
files: "\\.ipynb$"
1717
- id: nbqa-black
1818
args: ["--line-length=120"]
@@ -41,3 +41,11 @@ repos:
4141
hooks:
4242
- id: markdown-link-check
4343
args: [-q]
44+
45+
- repo: https://github.com/codespell-project/codespell
46+
rev: v2.4.1
47+
hooks:
48+
- id: codespell
49+
#files: ^.*\.(py|md|ipynb)$
50+
args: ["--ignore-words", ".codespellignore", "--skip=*.bib", "--skip=*.lock","--skip=run_itslive_nbs.py"]
51+

book/_config.yml

+5-3
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ parse:
5353
- substitution
5454
sphinx:
5555
config:
56+
# application/vnd.holoviews_load.v0+json, application/vnd.holoviews_exec.v0+json
57+
suppress_warnings: ["mystnb.unknown_mime_type"]
5658
bibtex_reference_style: label
5759
myst_heading_anchors: 3
5860
myst_enable_extensions:
@@ -68,12 +70,12 @@ sphinx:
6870
- substitution
6971

7072
myst_substitutions:
71-
part1_title: "Background"
73+
part1_title: "Part 2: Background"
7274
part2_title: "ITS_LIVE ice velocity data tutorial"
7375
#part2_title: "Using Xarray to examine cloud-based glacier surface velocity data"
7476
part3_title: "Sentinel-1 RTC imagery tutorial"
7577
#part3_title: "Sentinel-1 RTC data workflows with xarray"
76-
part4_title: "Summary + Conclusion"
78+
part4_title: "Part 5: Conclusion"
7779

7880
#tutorial 1 nb titles
7981
title_its_nb1: "# 3.1 Accessing cloud-hosted ITS_LIVE data"
@@ -122,7 +124,7 @@ sphinx:
122124
a_its_nb2: "A. Compare approaches for reading larger than memory data"
123125
a1_its_nb2: "1) `chunks = 'auto'`"
124126
a2_its_nb2: "2) `chunks = {}`"
125-
a3_its_nb2: "3) An out-of-order time dimensions"
127+
a3_its_nb2: "3) An out-of-order time dimension"
126128
a4_its_nb2: "4) Read the dataset without Dask"
127129
b_its_nb2: "B. Organize data once it's in memory"
128130
b1_its_nb2: "1) Arrange dataset in chronological order"

book/_toc.yml

+21-19
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,35 @@ root: introduction
44
parts:
55
- caption: Part 1. Introduction
66
chapters:
7-
- file: intro/getting_started
8-
- file: intro/learning_objectives
9-
- file: intro/open_source_setting
7+
- file: intro/1_getting_started
8+
- file: intro/2_learning_objectives
9+
- file: intro/3_open_source_setting
1010
- caption: Part 2. Background
1111
chapters:
12-
- file: background/context_motivation
13-
- file: background/data_cubes
14-
- file: background/tutorials_overview
15-
- file: background/tutorial_data
16-
- file: background/software
17-
- file: background/relevant_concepts
12+
- file: background/background.md
13+
- file: background/1_context_motivation
14+
#- file: background/ard_data_tidying
15+
- file: background/2_data_cubes
16+
- file: background/3_tutorials_overview
17+
- file: background/4_tutorial_data
18+
- file: background/5_software
19+
- file: background/6_relevant_concepts
1820
- caption: Part 3. ITS_LIVE Tutorial
1921
chapters:
2022
- file: itslive/itslive_intro
21-
- file: itslive/nbs/accessing_itslive_s3_data
22-
- file: itslive/nbs/larger_than_memory_data
23-
- file: itslive/nbs/combining_raster_vector_data
24-
- file: itslive/nbs/exploratory_data_analysis_single
25-
- file: itslive/nbs/exploratory_data_analysis_group
23+
- file: itslive/nbs/1_accessing_itslive_s3_data
24+
- file: itslive/nbs/2_larger_than_memory_data
25+
- file: itslive/nbs/3_combining_raster_vector_data
26+
- file: itslive/nbs/4_exploratory_data_analysis_single
27+
- file: itslive/nbs/5_exploratory_data_analysis_group
2628
- caption: Part 4. Sentinel-1 RTC Tutorial
2729
chapters:
2830
- file: sentinel1/s1_intro
29-
- file: sentinel1/nbs/read_asf_data
30-
- file: sentinel1/nbs/wrangle_metadata
31-
- file: sentinel1/nbs/asf_exploratory_analysis
32-
- file: sentinel1/nbs/read_pc_data
33-
- file: sentinel1/nbs/comparing_s1_rtc_datasets
31+
- file: sentinel1/nbs/1_read_asf_data
32+
- file: sentinel1/nbs/2_wrangle_metadata
33+
- file: sentinel1/nbs/3_asf_exploratory_analysis
34+
- file: sentinel1/nbs/4_read_pc_data
35+
- file: sentinel1/nbs/5_comparing_s1_rtc_datasets
3436
- caption: Part 5. Conclusion
3537
chapters:
3638
- file: conclusion/wrapping_up
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,33 @@
11
# 2.1 Context & Motivation
22

3-
This book demonstrates scientific workflows using publicly-available, cloud-optimized geospatial datasets and open-source scientific software tools in order to address the need for educational resources related to new technologies and reduce barriers to entry to working with earth observation data. The tutorials in this book focus on the complexities inherent to working with n-dimensional, gridded datasets and use the core stack of software packages built on and around the Xarray data model.
3+
This book demonstrates scientific workflows using publicly available, cloud-optimized geospatial datasets and open-source scientific software tools in order to address the need for educational resources related to new technologies and reduce barriers to entry to working with earth observation data. The tutorials in this book focus on the complexities inherent to working with n-dimensional, gridded datasets and use the core stack of software packages built on and around the Xarray data model.
44

5-
### *Moving away from the 'download model' of scientific data analysis*
5+
## *Moving away from the 'download model' of scientific data analysis*
66

7-
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis.
7+
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis ({cite:t}`abernathey_2021_cloud,gentemann_2021_science,stern_2022_PangeoForge`).
88

99
```{epigraph}
1010
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis. After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11-
-- {cite}`abernathey_2021_cloud`
11+
-- {cite:t}`abernathey_2021_cloud`
1212
```
1313

14-
### *Increasingly large, cloud-optimized data means new tools and approaches for data management*
14+
## *Increasingly large, cloud-optimized data means new tools and approaches for data management*
1515

16-
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`).
16+
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,Boulton02012018,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`).
1717

1818
```{figure} imgs/fy24-projection-chart.png
1919
---
2020
---
2121
Volume of NASA Earth Science Data archives, including growth of existing-mission archives and new missions, projected through 2029. Source: [NASA EarthData - Open Science](https://www.earthdata.nasa.gov/about/open-science).
2222
```
2323

24-
### *Asking questions of complex datasets*
24+
## *Asking questions of complex datasets*
2525

26-
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (e.g. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With the increasingly complex and large volume of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts {cite}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`.
26+
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (e.g. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With the increasingly complex and large volume of earth observation data that is currently available, storing, managing and organizing this information can very quickly become a complex and challenging task, especially for students and early-career analysts ({cite:t}`mathieu_2017_esas,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user,stern_2022_PangeoForge`).
2727

28-
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained open-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available, cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
28+
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datasets, introduce users to the landscape of popular, actively-maintained open-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available, cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
29+
30+
31+
32+
33+

0 commit comments

Comments
 (0)