Skip to content

Commit d935d8e

Browse files
authored
Merge pull request #88 from esciencecenter-digital-skills/81-crop-a-raster-episode
81 crop a raster episode
2 parents 475ae4f + 5233be9 commit d935d8e

10 files changed

+668
-0
lines changed

_episodes/20-crop-raster-data.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
---
2+
title: "Crop raster data with rioxarray and geopandas"
3+
teaching: 40
4+
exercises: 20
5+
questions:
6+
- "How can I crop my raster data to the area of interest?"
7+
objectives:
8+
- "Crop raster data with a bounding box."
9+
- "Crop raster data with a polygon."
10+
- "Crop raster data within a geometry buffer."
11+
keypoints:
12+
- "Use `clip_box` in `DataArray.rio` to crop raster with a bounding box."
13+
- "Use `clip` in `DataArray.rio` to crop raster with a given polygon."
14+
- "Use `buffer` in `geopandas` to make a buffer polygon of a (multi)point or a polyline. This polygon can be used to crop data."
15+
---
16+
17+
It is quite common that the raster data you have in hand is too large to process, or not all the pixels are relevant to your area of interest (AoI). In both situations, you should consider cropping your raster data before performing data analysis.
18+
19+
In this episode, we will introduce how to crop raster data into the desired area. We will use one Sentinel-2 image over Amsterdam as the example raster data, and introduce how to crop your data to different types of AoIs.
20+
21+
> ## Introduce the Data
22+
>
23+
> Raster data: We will use the Sentinel-2 raster data retrieved from the Data Access episode. If you skipped this episode, you can also directly download data from Figshare.
24+
>
25+
> Vector data: we will use the PDOK vector data from the Vector data introduction episode.
26+
{: .callout}
27+
28+
## Crop raster data with a bounding box
29+
30+
First, we can have a glimpse of the raster data by loading it with `rioxarray`:
31+
32+
~~~
33+
import rioxarray
34+
35+
# Load image and visualize
36+
raster = rioxarray.open_rasterio('data/S2_amsterdam.tif')
37+
raster.plot.imshow(figsize=(8,8))
38+
~~~
39+
{: .language-python}
40+
41+
<img src="../fig/20-crop-raster-original-raster-00.png" title="Overview of the raster" width="512" style="display: block; margin: auto;" />
42+
43+
The raster data is quite big. It even takes tens of seconds to visualize. But do we need the entire raster? Suppose we are interested in the crop fields, we can simply compare its coverage with the raster coverage:
44+
45+
~~~
46+
import geopandas as gpd
47+
from shapely.geometry import box
48+
from matplotlib import pyplot as plt
49+
50+
# Load the polygons of the crop fields
51+
cf_boundary_crop = gpd.read_file("data/crop_fields/cf_boundary_crop.shp")
52+
cf_boundary_crop = cf_boundary_crop.to_crs(raster.rio.crs) # convert to the same CRS
53+
54+
# Plot the bounding box over the raster
55+
bounds = box(*cf_boundary_crop.total_bounds)
56+
bb_cropfields = gpd.GeoDataFrame(index=[0], crs=raster.rio.crs, geometry=[bounds])
57+
fig, ax = plt.subplots()
58+
fig.set_size_inches((8,8))
59+
raster.plot.imshow(ax=ax)
60+
bb_cropfields.plot(ax=ax, alpha=0.6)
61+
~~~
62+
{: .language-python}
63+
64+
<img src="../fig/20-crop-raster-bounding-box-01.png" title="Bounding boxes of AoI over the raster" width="512" style="display: block; margin: auto;" />
65+
66+
Seeing from the bounding boxes, the crop fields (red) only takes a small part of the raster (blue). Therefore before actual processing, we can first crop the raster to our area of interest. The `clip_box` function allows one to crop a raster by the min/max of the x and y coordinates.
67+
68+
~~~
69+
# Crop the raster with the bounding box
70+
raster_clip = raster.rio.clip_box(*cf_boundary_crop.total_bounds)
71+
print(raster_clip.shape)
72+
~~~
73+
{: .language-python}
74+
~~~
75+
(3, 1565, 1565)
76+
~~~
77+
{: .output}
78+
79+
We successfully cropped the raster to a much smaller piece. We can visualize it now:
80+
81+
~~~
82+
raster_clip.plot.imshow(figsize=(8,8))
83+
~~~
84+
85+
{: .language-python}
86+
<img src="../fig/20-crop-raster-crop-by-bb-02.png" title="Crop raster by a bounding box" width="512" style="display: block; margin: auto;" />
87+
88+
89+
## Crop raster data with a polygon
90+
91+
It is common that the AoI is given by a polygon, which can be also used to crop the raster. For the example, we make a simple polygon within the raster clip we just made, and select the raster pixels within the polygon. This can be done with the `clip` function:
92+
93+
~~~
94+
95+
from matplotlib import pyplot as plt
96+
from shapely.geometry import Polygon
97+
xlist= [630000, 629000, 638000, 639000, 634000, 630000]
98+
ylist = [5.804e6, 5.814e6, 5.816e6, 5.806e6, 5.803e6, 5.804e6]
99+
polygon_geom = Polygon(zip(xlist, ylist))
100+
polygon = gpd.GeoDataFrame(index=[0], crs=raster_clip.rio.crs, geometry=[polygon_geom])
101+
102+
# Plot the polygon over raster
103+
fig, ax = plt.subplots()
104+
fig.set_size_inches((8,8))
105+
raster_clip.plot.imshow(ax=ax)
106+
polygon.plot(ax=ax, edgecolor='blue', alpha=0.6)
107+
108+
# Crop and visualize
109+
raster_clip_polygon = raster_clip.rio.clip(polygon['geometry'], polygon.crs)
110+
raster_clip_polygon.plot.imshow(figsize=(8,8))
111+
~~~
112+
{: .language-python}
113+
114+
<img src="../fig/20-crop-raster-crop-by-polygon-03.png" title="Crop raster by a polygon" width="1024" style="display: block; margin: auto;" />
115+
116+
117+
> ## Exercise: Compare two ways of bounding box cropping
118+
> So far, we have learned two ways of cropping a raster: by a bounding box (using `clip_box`) and by a polygon (using `clip`). Technically, a bounding box is also a polygon. So what if we crop the original image directly with the polygon? For example:
119+
> ~~~
120+
> raster_clip_polygon2 = raster.rio.clip(polygon['geometry'], polygon.crs)
121+
> raster_clip_polygon2.plot.imshow()
122+
> ~~~
123+
> {: .language-python}
124+
> And try to compare the two methods:
125+
> - Do you have the same results?
126+
> - Do you have the same execution time?
127+
> - How would you choose the two methods in your daily work?
128+
>
129+
> > ## Solution
130+
> >
131+
> > The two methods give the same results, but cropping directly with a polygon is much slower.
132+
> >
133+
> > Therefore, if the AoI is much smaller than the original raster, it would be more efficient to first crop the raster with a bounding box, then crop with the actual AoI polygon.
134+
> {: .solution}
135+
{: .challenge}
136+
137+
> ## Exercise: Select the raster data within crop fields
138+
> Let's select out all the crop fields from the raster data, using the `.shp` file in `data/crop_fields` and visualize the results.
139+
>
140+
> > ## Solution
141+
> >
142+
> > ~~~
143+
> > # Load the crop fields polygons
144+
> > cf_boundary_crop = gpd.read_file("data/crop_fields/cf_boundary_crop.shp")
145+
> > # Crop
146+
> > raster_clip_fields = raster_clip.rio.clip(cf_boundary_crop['geometry'], cf_boundary_crop.crs)
147+
> > # Visualize
148+
> > raster_clip_fields.plot.imshow(figsize=(8,8))
149+
> > ~~~
150+
> > {: .language-python}
151+
> > <img src="../fig/20-crop-raster-crop-fields-solution-06.png" title="Raster cropped by crop fields" width="512" style="display: block; margin: auto;" />
152+
> {: .solution}
153+
{: .challenge}
154+
155+
156+
## Crop raster data with a geometry buffer
157+
158+
It is not always the case that the AoI comes in the format of a polygon. Sometimes one would like to perform analysis around a (set of) point(s), or polyline(s). For example, in our AoI, there are also some groundwater monitoring wells coming as point vector data. One may also want to perform analysis around these wells. The location of the wells is stored in `data/groundwater_monitoring_well`.
159+
160+
~~~
161+
# Load wells
162+
wells = gpd.read_file("data/groundwater_monitoring_well/groundwater_monitoring_well.shp")
163+
wells = wells.to_crs(raster_clip.rio.crs)
164+
165+
# Plot the wells over raster
166+
fig, ax = plt.subplots()
167+
fig.set_size_inches((8,8))
168+
raster_clip.plot.imshow(ax=ax)
169+
wells.plot(ax=ax, color='red', markersize=2)
170+
~~~
171+
{: .language-python}
172+
173+
<img src="../fig/20-crop-raster-wells-04.png" title="Ground weter level wells" width="512" style="display: block; margin: auto;" />
174+
175+
To select pixels around the geometries, one needs to first define a region including the geometries. This region is called a "buffer" and it is defined in the units of the projection. The size of the buffer depends on the analysis in your research. A buffer is also a polygon, which can be used to crop the raster data. The package `geopandas` has a `buffer` function to make buffer polygons.
176+
177+
~~~
178+
# Create 200m buffer around the wells
179+
wells_buffer = wells.buffer(200)
180+
181+
# Crop
182+
raster_clip_wells = raster_clip.rio.clip(wells_buffer, wells_buffer.crs)
183+
184+
# Visualize buffer on raster
185+
fig, (ax1, ax2) = plt.subplots(1, 2)
186+
fig.set_size_inches((16,8))
187+
raster_clip.plot.imshow(ax=ax1)
188+
wells_buffer.plot(ax=ax1, color='red')
189+
190+
# Visualize cropped buffer
191+
raster_clip_wells.plot.imshow(ax=ax2)
192+
~~~
193+
{: .language-python}
194+
195+
<img src="../fig/20-crop-raster-crop-by-well-buffers-05.png" title="Raster croped by buffer around wells" width="1024" style="display: block; margin: auto;" />
196+
197+
> ## Exercise: Select the raster data around the dike
198+
> The dikes are stored as polylines in `.shp` file in `data/dikes`. Let's select out all the raster data within 100m around the dikes and visualize the results.
199+
>
200+
> > ## Solution
201+
> > ~~~
202+
> > # Load dikes polyline
203+
> > dikes = gpd.read_file("data/dikes/dikes.shp")
204+
> > dikes = dikes.to_crs(raster.rio.crs)
205+
> > # Dike buffer
206+
> > dikes_buffer = dikes.buffer(100)
207+
> > # Crop
208+
> > raster_clip_dikes = raster_clip.rio.clip(dikes_buffer, dikes_buffer.crs)
209+
> > # Visualize
210+
> > raster_clip_dikes.plot.imshow(figsize=(8,8))
211+
> > ~~~
212+
> > {: .language-python}
213+
> > <img src="../fig/20-crop-raster-dike-solution-07.png" title="Raster croped by buffer around dikes" width="512" style="display: block; margin: auto;" />
214+
> {: .solution}
215+
{: .challenge}

0 commit comments

Comments
 (0)