Skip to content

improving the advanced indexing notebooks #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
df745a5
adding the links to all indexing materials
negin513 Jul 11, 2023
155c810
typo fix + remove a redundant example.
negin513 Jul 11, 2023
bb65245
Merge branch 'xarray-contrib:main' into main
negin513 Jun 6, 2024
3300d69
updates to indexing
negin513 Jun 6, 2024
19c97f4
updating advanced indexing
negin513 Jun 6, 2024
f25af4c
advanced indexing
negin513 Jun 6, 2024
9ff571a
update indexing redundancies
negin513 Jun 6, 2024
64d714c
adding excercise
negin513 Jun 6, 2024
8999e25
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2024
3287085
few fixes for build fail
negin513 Jun 6, 2024
5b68630
updating header
negin513 Jun 6, 2024
e7bd5c3
updating header
negin513 Jun 6, 2024
50385e9
Merge branch 'main' into indexing
scottyhq Jun 25, 2024
fd4e240
align with new exercise syntax
scottyhq Jun 25, 2024
473ac50
adding advanced indexing
negin513 Jul 1, 2024
f670605
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 1, 2024
2a5ffb2
add numpy advanced indexing
negin513 Jul 1, 2024
82d7bf1
update learning objectives
negin513 Jul 2, 2024
5e3ef9b
few minor updates and wording changes
negin513 Jul 2, 2024
7ff7a19
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
24462e4
update indexing docs
negin513 Jul 2, 2024
792fc33
quick merge conflict resolve
negin513 Jul 2, 2024
c22c825
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
11e2aa4
update docs
negin513 Jul 2, 2024
7d70d22
adding np.ix_
negin513 Jul 2, 2024
89459e5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
5f18954
fix merge
negin513 Jul 2, 2024
3081b36
typo fix
negin513 Jul 2, 2024
383eeae
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/orthogonal_vs_vectorized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
160 changes: 123 additions & 37 deletions intermediate/indexing/advanced-indexing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"## Learning Objectives\n",
"\n",
"* Orthogonal vs. Vectorized and Pointwise Indexing"
"* Orthogonal vs. Pointwise Indexing"
]
},
{
Expand All @@ -17,60 +17,100 @@
"source": [
"## Overview\n",
"\n",
"In the previous notebooks, we learned basic forms of indexing with xarray (positional and name based dimensions, integer and label based indexing), Datetime Indexing, and nearest neighbor lookups. In this tutorial, we will learn how Xarray indexing is different from Numpy and how to do vectorized/pointwise indexing using Xarray. \n",
"First, let's import packages needed for this repository: "
"In the previous notebooks, we learned basic forms of indexing with Xarray (positional and name based dimensions, integer and label based indexing), datetime Indexing, and nearest neighbor lookups. Xarray positional indexing deviates from the NumPy when indexing with multiple arrays like `arr[[0, 1], [0, 1]]`.\n",
"\n",
"In this tutorial we learn about this difference and how to do vectorized/pointwise indexing using Xarray.\n",
"\n",
"For this notebook, first, we should learn about orthogonal (i.e. outer) and vectorized (i.e. pointwise) indexing concepts. \n",
"\n",
"* *Orthogonal* or *outer* indexing allows for indexing along each dimension independently, treating the indexers as one-dimensional arrays. The principle of outer or orthogonal indexing is that the result mirrors the effect of independently indexing along each dimension with integer or boolean arrays, treating both the indexed and indexing arrays as one-dimensional. This method of indexing is analogous to vector indexing in programming languages like MATLAB, Fortran, and R, where each indexer component *independently* selects along its corresponding dimension. This is the default behavior in Xarray.\n",
"\n",
"* *Vectorized* indexing is a more general form of indexing that allows for arbitrary combinations of indexing arrays. This method of indexing is analogous to the broadcasting rules in NumPy, where the dimensions of the indexers are aligned and the result is determined by the shape of the indexers. This is the default behavior in NumPy. \n",
"\n",
"\n",
"We can better understand this with an example: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import xarray as xr\n",
"\n",
"\n",
"xr.set_options(display_expand_attrs=False)\n",
"np.set_printoptions(threshold=10, edgeitems=2)"
"# Create a 5x5 array with values from 1 to 25\n",
"np_array = np.arange(1, 26).reshape(5, 5)\n",
"np_array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we’ll use air temperature tutorial dataset from the National Center for Environmental Prediction. "
"Now create a Xarray DataArray from this NumPy array: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"da = ds.air\n",
"ds"
"import xarray as xr\n",
"\n",
"da = xr.DataArray(np_array, dims=[\"x\", \"y\"])\n",
"da"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np_array[[0, 2, 4], [0, 2, 4]]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"da[[0, 2, 4], [0, 2, 4]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Orthogonal Indexing \n",
"The image below summarizes the difference between orthogonal and vectorized indexing for a 2D 5x5 array. \n",
"\n",
"As we learned in the previous tutorial, positional indexing deviates from the behavior exhibited by NumPy when indexing with multiple arrays. However, Xarray pointwise indexing supports the indexing along multiple labeled dimensions using list-like objects similar to NumPy indexing behavior.\n",
"\n",
"If you only provide integers, slices, or unlabeled arrays (array without dimension names, such as `np.ndarray`, `list`, but not `DataArray()`) indexing can be understood as orthogonally (i.e. along independent axes, instead of using NumPy’s broadcasting rules to vectorize indexers). \n",
"\n",
"*Orthogonal* or *outer* indexing considers one-dimensional arrays in the same way as slices when deciding the output shapes. The principle of outer or orthogonal indexing is that the result mirrors the effect of independently indexing along each dimension with integer or boolean arrays, treating both the indexed and indexing arrays as one-dimensional. This method of indexing is analogous to vector indexing in programming languages like MATLAB, Fortran, and R, where each indexer component *independently* selects along its corresponding dimension. \n",
"![Orthogonal vs. Vectorized Indexing](../../images/orthogonal_vs_vectorized.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Point-wise indexing, shown on the left, selects specific elements at given coordinates, resulting in an array of those individual elements. In the example shown, the indices `[0, 2, 4]`, `[0, 2, 4]` select the elements at positions (0, 0), (2, 2), and (4, 4), resulting in the values `[1, 13, 25]`. This is shown in NumPy indexing example. \n",
"\n",
"For example : "
"\n",
" In contrast, **orthogonal indexing** uses the same indices to select entire rows and columns, forming a cross-product of the specified indices. This method results in subarrays that include all combinations of the selected rows and columns. The example demonstrates this by selecting rows 0, 2, and 4 and columns 0, 2, and 4, resulting in a subarray containing `[[1, 3, 5], [11, 13, 15], [21, 23, 25]]`. This is shown in Xarray indexing example.\n",
" \n",
" The output of orthogonal indexing is a 3x3 array, while the output of vectorized indexing is a 1D array."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Orthogonal Indexing in Xarray\n",
"\n",
"If you only provide integers, slices, or unlabeled arrays (array without dimension names, such as `np.ndarray`, `list`, but not `DataArray()`) indexing can be understood as orthogonally (i.e. along independent axes, instead of using NumPy’s broadcasting rules to vectorize indexers). In the example above we saw this behavior, but let's see it in action with a real dataset."
]
},
{
Expand All @@ -81,14 +121,41 @@
},
"outputs": [],
"source": [
"da.isel(time=0, lat=[2, 4, 10, 13], lon=[1, 6, 7]).plot(); # -- orthogonal indexing"
"import numpy as np\n",
"import pandas as pd\n",
"import xarray as xr\n",
"\n",
"\n",
"xr.set_options(display_expand_attrs=False)\n",
"np.set_printoptions(threshold=10, edgeitems=2)\n",
"\n",
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"da_air = ds.air\n",
"ds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"selected_da = da_air.isel(time=0, lat=[2, 4, 10, 13], lon=[1, 6, 7]) # -- orthogonal indexing\n",
"selected_da"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more flexibility, you can supply `DataArray()` objects as indexers. Dimensions on resultant arrays are given by the ordered union of the indexers’ dimensions:\n",
"👆 please notice how the output if the indexing example above resulted in an array of 3x4. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more flexibility, you can supply `DataArray()` objects as indexers. Dimensions on resultant arrays are given by the ordered union of the indexers’ dimensions.\n",
"\n",
"For example, in the example below we do orthogonal indexing using `DataArray()` objects. "
]
Expand All @@ -104,7 +171,7 @@
"target_lat = xr.DataArray([31, 41, 42, 42], dims=\"degrees_north\")\n",
"target_lon = xr.DataArray([200, 201, 202, 205], dims=\"degrees_east\")\n",
"\n",
"da.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing"
"da_air.sel(lat=target_lat, lon=target_lon, method=\"nearest\") # -- orthogonal indexing"
]
},
{
Expand All @@ -123,15 +190,14 @@
"\n",
"But what if we would like to find the information from the nearest grid cell to a collection of specified points (for example, weather stations or tower data)?\n",
"\n",
"## Vectorized or Pointwise Indexing\n",
"## Vectorized or Pointwise Indexing in Xarray\n",
"\n",
"Like NumPy and pandas, Xarray supports indexing many array elements at once in a\n",
"*vectorized* manner. \n",
"Like NumPy and pandas, Xarray supports indexing many array elements at once in a *vectorized* manner. \n",
"\n",
"**Vectorized indexing** or **Pointwise Indexing** using `DataArrays()` can be used to extract information from the nearest grid cells of interest, for example, the nearest climate model grid cells to a collection of specified weather station latitudes and longitudes.\n",
"\n",
"```{hint}\n",
"To trigger vectorized indexing behavior, you will need to provide the selection dimensions with a new shared output dimension name. \n",
"To trigger vectorized indexing behavior, you will need to provide the selection dimensions with a different name than the original dimensions. This dimension name will be used in the output array.\n",
"```\n",
"\n",
"In the example below, the selections of the closest latitude and longitude are renamed to an output dimension named `points`:"
Expand All @@ -147,7 +213,6 @@
"source": [
"# Define target latitude and longitude (where weather stations might be)\n",
"lat_points = xr.DataArray([31, 41, 42, 42], dims=\"points\")\n",
"lon_points = xr.DataArray([200, 201, 202, 205], dims=\"points\")\n",
"lat_points"
]
},
Expand All @@ -159,6 +224,7 @@
},
"outputs": [],
"source": [
"lon_points = xr.DataArray([200, 201, 202, 205], dims=\"points\")\n",
"lon_points"
]
},
Expand All @@ -177,7 +243,7 @@
},
"outputs": [],
"source": [
"da.sel(lat=lat_points, lon=lon_points, method=\"nearest\")"
"da_air.sel(lat=lat_points, lon=lon_points, method=\"nearest\")"
]
},
{
Expand All @@ -195,7 +261,7 @@
},
"outputs": [],
"source": [
"da.sel(lat=lat_points, lon=lon_points, method=\"nearest\").dims"
"da_air.sel(lat=lat_points, lon=lon_points, method=\"nearest\").dims"
]
},
{
Expand All @@ -217,16 +283,36 @@
},
"outputs": [],
"source": [
"da.sel(lat=[20, 30, 40], lon=lon_points, method=\"nearest\")"
"da_air.sel(lat=[20, 30, 40], lon=lon_points, method=\"nearest\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{warning}\n",
"If an indexer is a `DataArray()`, its coordinates should not conflict with the selected subpart of the target array (except for the explicitly indexed dimensions with `.loc`/`.sel`). Otherwise, `IndexError` will be raised!\n",
"```"
"## Excersises\n",
"\n",
"```{exercise}\n",
":label: indexing-advanced-1\n",
"\n",
"In the simple 2D 5x5 Xarray data array above, select the sub-array containing (0,0),(2,2),(4,4) : \n",
"```\n",
"\n",
"\n",
"\n",
"````{solution} indexing-advanced-1\n",
":class: dropdown\n",
"```python\n",
"\n",
"indices = np.array([0, 2, 4])\n",
"\n",
"xs_da = xr.DataArray(indices, dims=\"points\")\n",
"ys_da = xr.DataArray(indices, dims=\"points\")\n",
"\n",
"subset_da = da.sel(x=xs_da, y=xs_da)\n",
"subset_da\n",
"```\n",
"````\n"
]
},
{
Expand Down
Loading