Skip to content

Commit fe39686

Browse files
committed
added more methods and suggestions
1 parent 16cb113 commit fe39686

File tree

2 files changed

+112
-60
lines changed

2 files changed

+112
-60
lines changed

_toc.yml

+1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ parts:
1818
- file: fundamentals/01_datastructures
1919
- file: fundamentals/01.1_creating_data_structures
2020
- file: fundamentals/01.1_io
21+
- file: fundamentals/01_datatree_imerghh.ipynb
2122
- file: fundamentals/02_labeled_data.md
2223
sections:
2324
- file: fundamentals/02.1_indexing_Basic.ipynb

DataTree/DataTree_Tutorial.ipynb renamed to fundamentals/01_datatree_imerghh.ipynb

+111-60
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,13 @@
1515
},
1616
{
1717
"cell_type": "code",
18-
"execution_count": null,
18+
"execution_count": 1,
1919
"metadata": {},
2020
"outputs": [],
2121
"source": [
2222
"import cartopy.crs as ccrs\n",
2323
"import matplotlib.pyplot as plt\n",
24-
"from xarray import open_datatree\n",
25-
"from metpy.plots import ctables"
24+
"import xarray as xr"
2625
]
2726
},
2827
{
@@ -38,17 +37,19 @@
3837
"metadata": {},
3938
"outputs": [],
4039
"source": [
41-
"gpm_imerghh_7 = open_datatree(\n",
42-
" '~/Downloads/3B-HHR.MS.MRG.3IMERG.20210829-S073000-E075959.0450.V07B.HDF5', engine='h5netcdf'\n",
40+
"imerghh_730 = xr.open_datatree(\n",
41+
" '~/xarray-data/imerghh_730.hdf5', engine='h5netcdf'\n",
4342
")\n",
44-
"gpm_imerghh_7"
43+
"imerghh_730"
4544
]
4645
},
4746
{
4847
"cell_type": "markdown",
4948
"metadata": {},
5049
"source": [
51-
"### List all of the groups with `.groups`"
50+
"### Nodes\n",
51+
"Groups in a netcdf4 or hdf5 file in the DataTree model are represented as \"nodes\" in the DataTree model.\n",
52+
"We can list all of the groups with `.groups`"
5253
]
5354
},
5455
{
@@ -57,7 +58,7 @@
5758
"metadata": {},
5859
"outputs": [],
5960
"source": [
60-
"gpm_imerghh_7.groups"
61+
"imerghh_730.groups"
6162
]
6263
},
6364
{
@@ -74,7 +75,7 @@
7475
"metadata": {},
7576
"outputs": [],
7677
"source": [
77-
"gpm_imerghh_7['/Grid']\n",
78+
"imerghh_730['/Grid']\n",
7879
"\n",
7980
"# Returns only the data contained in the \"/Grid\" group"
8081
]
@@ -85,7 +86,7 @@
8586
"metadata": {},
8687
"outputs": [],
8788
"source": [
88-
"gpm_imerghh_7['/Grid/precipitation']"
89+
"imerghh_730['/Grid/precipitation']"
8990
]
9091
},
9192
{
@@ -94,7 +95,9 @@
9495
"metadata": {},
9596
"outputs": [],
9697
"source": [
97-
"gpm_imerghh_7.Grid.precipitation"
98+
"imerghh_730.Grid.precipitation\n",
99+
"\n",
100+
"# Method based syntax"
98101
]
99102
},
100103
{
@@ -110,7 +113,7 @@
110113
"metadata": {},
111114
"outputs": [],
112115
"source": [
113-
"gpm_imerghh_7['/Grid/Intermediate'].parent"
116+
"imerghh_730['/Grid/Intermediate'].parent"
114117
]
115118
},
116119
{
@@ -119,7 +122,7 @@
119122
"metadata": {},
120123
"outputs": [],
121124
"source": [
122-
"gpm_imerghh_7.Grid.children"
125+
"imerghh_730.Grid.children"
123126
]
124127
},
125128
{
@@ -145,7 +148,7 @@
145148
"metadata": {},
146149
"outputs": [],
147150
"source": [
148-
"gpm_imerghh_7.dims\n",
151+
"imerghh_730.dims\n",
149152
"# Note there are no dimensions, coordinates, or data variables defined at the root node"
150153
]
151154
},
@@ -155,7 +158,7 @@
155158
"metadata": {},
156159
"outputs": [],
157160
"source": [
158-
"gpm_imerghh_7.attrs"
161+
"imerghh_730.attrs"
159162
]
160163
},
161164
{
@@ -164,7 +167,7 @@
164167
"metadata": {},
165168
"outputs": [],
166169
"source": [
167-
"gpm_imerghh_7['/Grid'].dims"
170+
"imerghh_730['/Grid'].dims"
168171
]
169172
},
170173
{
@@ -173,7 +176,7 @@
173176
"metadata": {},
174177
"outputs": [],
175178
"source": [
176-
"gpm_imerghh_7['/Grid/Intermediate'].dims"
179+
"imerghh_730['/Grid/Intermediate'].dims"
177180
]
178181
},
179182
{
@@ -182,17 +185,16 @@
182185
"metadata": {},
183186
"outputs": [],
184187
"source": [
185-
"gpm_imerghh_7['/Grid/Intermediate'].data_vars"
188+
"imerghh_730['/Grid/Intermediate'].data_vars"
186189
]
187190
},
188191
{
189192
"cell_type": "markdown",
190193
"metadata": {},
191194
"source": [
192-
"### Plotting precipitation data with DataTree\n",
193-
"Xarray’s plotting capabilities are centered around DataArray objects. To plot DataTree objects we access their relevant DataArrays in this case, `gpm_imerghh_7['/Grid/precipitation']`. \n",
194-
"\n",
195-
"We use the `.where()` method to get a subset of precipitation data over the Gulf of Mexico."
195+
"### Creating a DataTree from a dictionary with `DataTree.from_dict()`\n",
196+
"You can create a DataTree from a dictionary of `xr.Datasets` objects or `xr.DataTree` objects.\n",
197+
"The key of the dictionary is the node/group of the new DataTree object."
196198
]
197199
},
198200
{
@@ -201,21 +203,18 @@
201203
"metadata": {},
202204
"outputs": [],
203205
"source": [
204-
"precipitation_subset = gpm_imerghh_7['/Grid/precipitation'].where(\n",
205-
" (gpm_imerghh_7['/Grid/precipitation'].lat >= 20)\n",
206-
" & (gpm_imerghh_7['/Grid/precipitation'].lat <= 35)\n",
207-
" & (gpm_imerghh_7['/Grid/precipitation'].lon >= -110)\n",
208-
" & (gpm_imerghh_7['/Grid/precipitation'].lon <= -78),\n",
209-
" drop=True,\n",
210-
")"
206+
"imerghh_830 = xr.open_datatree('~/xarray-data/imerghh_830.hdf5', engine='h5netcdf')\n",
207+
"xr.DataTree.from_dict({'time_830': imerghh_830})"
211208
]
212209
},
213210
{
214211
"cell_type": "markdown",
215212
"metadata": {},
216213
"source": [
217-
"### Data masking\n",
218-
"We add a data mask to the precipitation values that are zero."
214+
"### Using `DataTree.from_dict()` to make a DataTree object\n",
215+
"Lets combine our two DataTree objects (`imerghh_730` and `imerghh_830`) at each time stamp with `DataTree.from_dict()`.\n",
216+
"All of the groups in the original datasets will remain intact but now we have two additional groups `/time_730` and `/time_830`.\n",
217+
"The groups `/Grid` and `/Grid/Intermediate`are nested in ancestor node's `/time_730` and `/time_830` respectively. They are all children of the root node `'/'`"
219218
]
220219
},
221220
{
@@ -224,33 +223,84 @@
224223
"metadata": {},
225224
"outputs": [],
226225
"source": [
227-
"precipitation_subset_mask = precipitation_subset.where(precipitation_subset > 0.0)"
226+
"combined_imerghh_tree = xr.DataTree.from_dict({'time_730': imerghh_730,\n",
227+
" 'time_830': imerghh_830})\n",
228+
"combined_imerghh_tree"
229+
]
230+
},
231+
{
232+
"cell_type": "code",
233+
"execution_count": null,
234+
"metadata": {},
235+
"outputs": [],
236+
"source": [
237+
"combined_imerghh_tree.children"
228238
]
229239
},
230240
{
231241
"cell_type": "markdown",
232242
"metadata": {},
233243
"source": [
234-
"### Add a custom precipitation color map from [metpy](https://unidata.github.io/MetPy/latest/api/generated/metpy.plots.ctables.html)"
244+
"### Combining data with DataTree\n",
245+
"DataTree objects (like Dataset objects) can contain `DataArray` objects. We can `concat` and `merge` DataArrays in an DataTree along a specified dimension. Lets combine the precipitation data from nodes `/time_730` and `/time_830`. Note these datasets have the same size across their `\"time\"`, `\"lat\"` and `\"lon\"` dimensions.\n"
235246
]
236247
},
237248
{
238249
"cell_type": "code",
239-
"execution_count": null,
250+
"execution_count": 17,
251+
"metadata": {},
252+
"outputs": [],
253+
"source": [
254+
"precip_concat = xr.concat([combined_imerghh_tree['time_730/Grid/precipitation'], combined_imerghh_tree['time_830/Grid/precipitation']], dim='time')"
255+
]
256+
},
257+
{
258+
"cell_type": "markdown",
259+
"metadata": {},
260+
"source": [
261+
"### Plotting precipitation data with DataTree\n",
262+
"Xarray’s plotting capabilities are centered around DataArray objects. To plot DataTree objects we access their relevant DataArrays in this case, our concatenated `DataArray` `precip_concat`. \n",
263+
"\n",
264+
"We use the `.where()` method to get a subset of precipitation data over the Gulf of Mexico."
265+
]
266+
},
267+
{
268+
"cell_type": "code",
269+
"execution_count": 18,
240270
"metadata": {},
241271
"outputs": [],
242272
"source": [
243-
"clevs = [0, 1, 2.5, 5, 7.5, 10, 15, 20, 30, 40, 50, 70, 100, 150, 200, 250, 300, 400, 500, 600, 750]\n",
244-
"norm, cmap = ctables.registry.get_with_boundaries('precipitation', clevs)\n",
245-
"cmap"
273+
"precip_concat_sub = precip_concat.where(\n",
274+
" (precip_concat.lat >= 20)\n",
275+
" & (precip_concat.lat <= 35)\n",
276+
" & (precip_concat.lon >= -110)\n",
277+
" & (precip_concat.lon <= -78),\n",
278+
" drop=True,)"
246279
]
247280
},
248281
{
249282
"cell_type": "markdown",
250283
"metadata": {},
251284
"source": [
252-
"### Plot the data with `.plot()`\n",
253-
"Note since this data is two-dimensional it calls `xarray.plot.pcolormesh()` by default with just the `.plot()` method."
285+
"### Data masking\n",
286+
"We add a data mask to the precipitation values that are zero. We will use the `.where()` method to get data values greater than 0.0"
287+
]
288+
},
289+
{
290+
"cell_type": "code",
291+
"execution_count": 19,
292+
"metadata": {},
293+
"outputs": [],
294+
"source": [
295+
"precipitation_subset_mask = precip_concat_sub.where(precip_concat_sub > 0.0)"
296+
]
297+
},
298+
{
299+
"cell_type": "markdown",
300+
"metadata": {},
301+
"source": [
302+
"### Plot the data with `.plot()` as a `FacetGrid` object\n",
303+
"We can use `xarray.plot.FacetGrid` objects to make plots with multiple axes. Each axes shows the same relationship conditioned on different levels of some dimension, in our case different time stamps. Note since this data is two-dimensional it calls `xarray.plot.pcolormesh()` by default with just the `.plot()` method."
254304
]
255305
},
256306
{
@@ -259,35 +309,35 @@
259309
"metadata": {},
260310
"outputs": [],
261311
"source": [
262-
"# Set the figure size, projection, extent and grid lines to the plot\n",
263-
"fig = plt.figure(figsize=(8, 8))\n",
264-
"ax = plt.axes(projection=ccrs.PlateCarree())\n",
265-
"ax.set_extent([-100, -80, 20, 35])\n",
266-
"ax.coastlines()\n",
267-
"gl = ax.gridlines(draw_labels=True, linewidth=1, color='black', linestyle='--')\n",
268-
"gl.right_labels = False\n",
269-
"gl.top_labels = False\n",
270-
"\n",
271-
"# Get the minimum and maximum values in the array\n",
272-
"min = precipitation_subset_mask.min()\n",
273-
"max = precipitation_subset_mask.max()\n",
274-
"\n",
275312
"# Plot the precipitation data\n",
276-
"precipitation_subset_mask[0].plot(\n",
313+
"precip_plot = precipitation_subset_mask.plot(figsize=(12, 6), transform=ccrs.PlateCarree(), subplot_kws={'projection':ccrs.PlateCarree()},\n",
277314
" x=\"lon\",\n",
278315
" y=\"lat\",\n",
279-
" ax=ax,\n",
280-
" cmap=cmap,\n",
281-
" cbar_kwargs={\"orientation\": \"horizontal\", \"pad\": 0.05},\n",
282-
" vmin=min,\n",
283-
" vmax=max,\n",
316+
" col='time', # The dimension (\"time\") we are faceting our plot on\n",
317+
" col_wrap=2, # Number of subplots\n",
318+
" cmap='jet',\n",
319+
" cbar_kwargs={\"orientation\": \"horizontal\", \"pad\": 0.15, \"shrink\": 0.6},\n",
320+
" vmin=precipitation_subset_mask.min(),\n",
321+
" vmax=precipitation_subset_mask.max(),\n",
322+
"\n",
284323
")\n",
285324
"\n",
286-
"plt.title('Half-hourly precipitation rate in the Gulf of Mexico on August 29, 2021 at 07:30')"
325+
"\n",
326+
"for ax in precip_plot.axs.flat:\n",
327+
" ax.set_extent([-100, -80, 20, 35])\n",
328+
" ax.coastlines()\n",
329+
" gl = ax.gridlines(linewidth=1, color='black', linestyle='--')\n",
330+
" gl.left_labels = True\n",
331+
" gl.bottom_labels = True\n"
287332
]
288333
}
289334
],
290335
"metadata": {
336+
"kernelspec": {
337+
"display_name": "utf-upgrade-datatree",
338+
"language": "python",
339+
"name": "python3"
340+
},
291341
"language_info": {
292342
"codemirror_mode": {
293343
"name": "ipython",
@@ -297,7 +347,8 @@
297347
"mimetype": "text/x-python",
298348
"name": "python",
299349
"nbconvert_exporter": "python",
300-
"pygments_lexer": "ipython3"
350+
"pygments_lexer": "ipython3",
351+
"version": "3.12.5"
301352
}
302353
},
303354
"nbformat": 4,

0 commit comments

Comments
 (0)