diff --git a/_toc.yml b/_toc.yml index 6c0aeda8..97e03a85 100644 --- a/_toc.yml +++ b/_toc.yml @@ -18,6 +18,7 @@ parts: - file: fundamentals/01_datastructures - file: fundamentals/01.1_creating_data_structures - file: fundamentals/01.1_io + - file: fundamentals/01_datatree_imerghh.ipynb - file: fundamentals/02_labeled_data.md sections: - file: fundamentals/02.1_indexing_Basic.ipynb diff --git a/fundamentals/01_datatree_imerghh.ipynb b/fundamentals/01_datatree_imerghh.ipynb new file mode 100644 index 00000000..c66bbd16 --- /dev/null +++ b/fundamentals/01_datatree_imerghh.ipynb @@ -0,0 +1,356 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# How to use `xarray.DataTree` with hierarchical data\n", + "\n", + "\n", + "## Overview: \n", + "\n", + "This notebook will demonstrate how to use `xarray.DataTree` with [_GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 (GPM_3IMERGHH_07)_](https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_07/summary) and use xarray's plotting capabilities to plot precipitation in the Gulf of Mexico during Hurricane Ida. GPM_3IMERGHH_07 is a L3 gridded product with a group hierarchical structure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import cartopy.crs as ccrs\n", + "import matplotlib.pyplot as plt\n", + "import xarray as xr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Opening the dataset with `open_datatree()`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730 = xr.open_datatree('~/xarray-data/imerghh_730.hdf5', engine='h5netcdf')\n", + "imerghh_730" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Nodes\n", + "Groups in a netcdf4 or hdf5 file in the DataTree model are represented as \"nodes\" in the DataTree model.\n", + "We can list all of the groups with `.groups`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730.groups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Accessing variables in a nested groups\n", + "Nested variables and groups can be accessed with either dict-like syntax or method based syntax." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid']\n", + "\n", + "# Returns only the data contained in the \"/Grid\" group" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid/precipitation']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730.Grid.precipitation\n", + "\n", + "# Method based syntax" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the parent and child nodes from a group" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid/Intermediate'].parent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730.Grid.children" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### `Xarray.DataTree` objects and `xarray.Dataset` objects have the same key properties like:\n", + "\n", + "- `dims`: a dictionary mapping of dimension names to lengths, for the variables in a node, and a node’s ancestors.\n", + "\n", + "- `data_vars`: a dict-like container of DataArrays corresponding to variables in a node.\n", + "\n", + "- `coords`: another dict-like container of DataArrays, corresponding to coordinate variables in a node, and a node’s ancestors.\n", + "\n", + "- `attrs`: dict with metadata relevant to data in a node.\n", + "\n", + "With `DataTree` you can get these properties at any of the nodes (groups) they are defined in." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730.dims\n", + "# Note there are no dimensions, coordinates, or data variables defined at the root node" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730.attrs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid'].dims" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid/Intermediate'].dims" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_730['/Grid/Intermediate'].data_vars" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Creating a DataTree from a dictionary with `DataTree.from_dict()`\n", + "You can create a DataTree from a dictionary of `xr.Datasets` objects or `xr.DataTree` objects.\n", + "The key of the dictionary is the node/group of the new DataTree object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "imerghh_830 = xr.open_datatree('~/xarray-data/imerghh_830.hdf5', engine='h5netcdf')\n", + "xr.DataTree.from_dict({'time_830': imerghh_830})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using `DataTree.from_dict()` to make a DataTree object\n", + "Lets combine our two DataTree objects (`imerghh_730` and `imerghh_830`) at each time stamp with `DataTree.from_dict()`.\n", + "All of the groups in the original datasets will remain intact but now we have two additional groups `/time_730` and `/time_830`.\n", + "The groups `/Grid` and `/Grid/Intermediate`are nested in ancestor node's `/time_730` and `/time_830` respectively. They are all children of the root node `'/'`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "combined_imerghh_tree = xr.DataTree.from_dict({'time_730': imerghh_730, 'time_830': imerghh_830})\n", + "combined_imerghh_tree" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "combined_imerghh_tree.children" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Combining data with DataTree\n", + "DataTree objects (like Dataset objects) can contain `DataArray` objects. We can `concat` and `merge` DataArrays in an DataTree along a specified dimension. Lets combine the precipitation data from nodes `/time_730` and `/time_830`. Note these datasets have the same size across their `\"time\"`, `\"lat\"` and `\"lon\"` dimensions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "precip_concat = xr.concat(\n", + " [\n", + " combined_imerghh_tree['time_730/Grid/precipitation'],\n", + " combined_imerghh_tree['time_830/Grid/precipitation'],\n", + " ],\n", + " dim='time',\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plotting precipitation data with DataTree\n", + "Xarray’s plotting capabilities are centered around DataArray objects. To plot DataTree objects we access their relevant DataArrays in this case, our concatenated `DataArray` `precip_concat`. \n", + "\n", + "We use the `.where()` method to get a subset of precipitation data over the Gulf of Mexico." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "precip_concat_sub = precip_concat.where(\n", + " (precip_concat.lat >= 20)\n", + " & (precip_concat.lat <= 35)\n", + " & (precip_concat.lon >= -110)\n", + " & (precip_concat.lon <= -78),\n", + " drop=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data masking\n", + "We add a data mask to the precipitation values that are zero. We will use the `.where()` method to get data values greater than 0.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "precipitation_subset_mask = precip_concat_sub.where(precip_concat_sub > 0.0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot the data with `.plot()` as a `FacetGrid` object\n", + "We can use `xarray.plot.FacetGrid` objects to make plots with multiple axes. Each axes shows the same relationship conditioned on different levels of some dimension, in our case different time stamps. Note since this data is two-dimensional it calls `xarray.plot.pcolormesh()` by default with just the `.plot()` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Plot the precipitation data\n", + "precip_plot = precipitation_subset_mask.plot(\n", + " figsize=(12, 6),\n", + " transform=ccrs.PlateCarree(),\n", + " subplot_kws={'projection': ccrs.PlateCarree()},\n", + " x=\"lon\",\n", + " y=\"lat\",\n", + " col='time', # The dimension (\"time\") we are faceting our plot on\n", + " col_wrap=2, # Number of subplots\n", + " cmap='jet',\n", + " cbar_kwargs={\"orientation\": \"horizontal\", \"pad\": 0.15, \"shrink\": 0.6},\n", + " vmin=precipitation_subset_mask.min(),\n", + " vmax=precipitation_subset_mask.max(),\n", + ")\n", + "\n", + "\n", + "for ax in precip_plot.axs.flat:\n", + " ax.set_extent([-100, -80, 20, 35])\n", + " ax.coastlines()\n", + " gl = ax.gridlines(linewidth=1, color='black', linestyle='--')\n", + " gl.left_labels = True\n", + " gl.bottom_labels = True" + ] + } + ], + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}