|
| 1 | +--- |
| 2 | +title: "Crop raster data with rioxarray and geopandas" |
| 3 | +teaching: 40 |
| 4 | +exercises: 20 |
| 5 | +questions: |
| 6 | +- "How can I crop my raster data to the area of interest?" |
| 7 | +objectives: |
| 8 | +- "Crop raster data with a bounding box." |
| 9 | +- "Crop raster data with a polygon." |
| 10 | +- "Crop raster data within a geometry buffer." |
| 11 | +keypoints: |
| 12 | +- "Use `clip_box` in `DataArray.rio` to crop raster with a bounding box." |
| 13 | +- "Use `clip` in `DataArray.rio` to crop raster with a given polygon." |
| 14 | +- "Use `buffer` in `geopandas` to make a buffer polygon of a (multi)point or a polyline. This polygon can be used to crop data." |
| 15 | +--- |
| 16 | + |
| 17 | +It is quite common that the raster data you have in hand is too large to process, or not all the pixels are relevant to your area of interest (AoI). In both situations, you should consider cropping your raster data before performing data analysis. |
| 18 | + |
| 19 | +In this episode, we will introduce how to crop raster data into the desired area. We will use one Sentinel-2 image over Amsterdam as the example raster data, and introduce how to crop your data to different types of AoIs. |
| 20 | + |
| 21 | +> ## Introduce the Data |
| 22 | +> |
| 23 | +> Raster data: We will use the Sentinel-2 raster data retrieved from the Data Access episode. If you skipped this episode, you can also directly download data from Figshare. |
| 24 | +> |
| 25 | +> Vector data: we will use the PDOK vector data from the Vector data introduction episode. |
| 26 | +{: .callout} |
| 27 | + |
| 28 | +## Crop raster data with a bounding box |
| 29 | + |
| 30 | +First, we can have a glimpse of the raster data by loading it with `rioxarray`: |
| 31 | + |
| 32 | +~~~ |
| 33 | +import rioxarray |
| 34 | +
|
| 35 | +# Load image and visualize |
| 36 | +raster = rioxarray.open_rasterio('data/S2_amsterdam.tif') |
| 37 | +raster.plot.imshow(figsize=(8,8)) |
| 38 | +~~~ |
| 39 | +{: .language-python} |
| 40 | + |
| 41 | +<img src="../fig/20-crop-raster-original-raster-00.png" title="Overview of the raster" width="512" style="display: block; margin: auto;" /> |
| 42 | + |
| 43 | +The raster data is quite big. It even takes tens of seconds to visualize. But do we need the entire raster? Suppose we are interested in the crop fields, we can simply compare its coverage with the raster coverage: |
| 44 | + |
| 45 | +~~~ |
| 46 | +import geopandas as gpd |
| 47 | +from shapely.geometry import box |
| 48 | +from matplotlib import pyplot as plt |
| 49 | +
|
| 50 | +# Load the polygons of the crop fields |
| 51 | +cf_boundary_crop = gpd.read_file("data/crop_fields/cf_boundary_crop.shp") |
| 52 | +cf_boundary_crop = cf_boundary_crop.to_crs(raster.rio.crs) # convert to the same CRS |
| 53 | +
|
| 54 | +# Plot the bounding box over the raster |
| 55 | +bounds = box(*cf_boundary_crop.total_bounds) |
| 56 | +bb_cropfields = gpd.GeoDataFrame(index=[0], crs=raster.rio.crs, geometry=[bounds]) |
| 57 | +fig, ax = plt.subplots() |
| 58 | +fig.set_size_inches((8,8)) |
| 59 | +raster.plot.imshow(ax=ax) |
| 60 | +bb_cropfields.plot(ax=ax, alpha=0.6) |
| 61 | +~~~ |
| 62 | +{: .language-python} |
| 63 | + |
| 64 | +<img src="../fig/20-crop-raster-bounding-box-01.png" title="Bounding boxes of AoI over the raster" width="512" style="display: block; margin: auto;" /> |
| 65 | + |
| 66 | +Seeing from the bounding boxes, the crop fields (red) only takes a small part of the raster (blue). Therefore before actual processing, we can first crop the raster to our area of interest. The `clip_box` function allows one to crop a raster by the min/max of the x and y coordinates. |
| 67 | + |
| 68 | +~~~ |
| 69 | +# Crop the raster with the bounding box |
| 70 | +raster_clip = raster.rio.clip_box(*cf_boundary_crop.total_bounds) |
| 71 | +print(raster_clip.shape) |
| 72 | +~~~ |
| 73 | +{: .language-python} |
| 74 | +~~~ |
| 75 | +(3, 1565, 1565) |
| 76 | +~~~ |
| 77 | +{: .output} |
| 78 | + |
| 79 | +We successfully cropped the raster to a much smaller piece. We can visualize it now: |
| 80 | + |
| 81 | +~~~ |
| 82 | +raster_clip.plot.imshow(figsize=(8,8)) |
| 83 | +~~~ |
| 84 | + |
| 85 | +{: .language-python} |
| 86 | +<img src="../fig/20-crop-raster-crop-by-bb-02.png" title="Crop raster by a bounding box" width="512" style="display: block; margin: auto;" /> |
| 87 | + |
| 88 | + |
| 89 | +## Crop raster data with a polygon |
| 90 | + |
| 91 | +It is common that the AoI is given by a polygon, which can be also used to crop the raster. For the example, we make a simple polygon within the raster clip we just made, and select the raster pixels within the polygon. This can be done with the `clip` function: |
| 92 | + |
| 93 | +~~~ |
| 94 | +
|
| 95 | +from matplotlib import pyplot as plt |
| 96 | +from shapely.geometry import Polygon |
| 97 | +xlist= [630000, 629000, 638000, 639000, 634000, 630000] |
| 98 | +ylist = [5.804e6, 5.814e6, 5.816e6, 5.806e6, 5.803e6, 5.804e6] |
| 99 | +polygon_geom = Polygon(zip(xlist, ylist)) |
| 100 | +polygon = gpd.GeoDataFrame(index=[0], crs=raster_clip.rio.crs, geometry=[polygon_geom]) |
| 101 | +
|
| 102 | +# Plot the polygon over raster |
| 103 | +fig, ax = plt.subplots() |
| 104 | +fig.set_size_inches((8,8)) |
| 105 | +raster_clip.plot.imshow(ax=ax) |
| 106 | +polygon.plot(ax=ax, edgecolor='blue', alpha=0.6) |
| 107 | +
|
| 108 | +# Crop and visualize |
| 109 | +raster_clip_polygon = raster_clip.rio.clip(polygon['geometry'], polygon.crs) |
| 110 | +raster_clip_polygon.plot.imshow(figsize=(8,8)) |
| 111 | +~~~ |
| 112 | +{: .language-python} |
| 113 | + |
| 114 | +<img src="../fig/20-crop-raster-crop-by-polygon-03.png" title="Crop raster by a polygon" width="1024" style="display: block; margin: auto;" /> |
| 115 | + |
| 116 | + |
| 117 | +> ## Exercise: Compare two ways of bounding box cropping |
| 118 | +> So far, we have learned two ways of cropping a raster: by a bounding box (using `clip_box`) and by a polygon (using `clip`). Technically, a bounding box is also a polygon. So what if we crop the original image directly with the polygon? For example: |
| 119 | +> ~~~ |
| 120 | +> raster_clip_polygon2 = raster.rio.clip(polygon['geometry'], polygon.crs) |
| 121 | +> raster_clip_polygon2.plot.imshow() |
| 122 | +> ~~~ |
| 123 | +> {: .language-python} |
| 124 | +> And try to compare the two methods: |
| 125 | +> - Do you have the same results? |
| 126 | +> - Do you have the same execution time? |
| 127 | +> - How would you choose the two methods in your daily work? |
| 128 | +> |
| 129 | +> > ## Solution |
| 130 | +> > |
| 131 | +> > The two methods give the same results, but cropping directly with a polygon is much slower. |
| 132 | +> > |
| 133 | +> > Therefore, if the AoI is much smaller than the original raster, it would be more efficient to first crop the raster with a bounding box, then crop with the actual AoI polygon. |
| 134 | +> {: .solution} |
| 135 | +{: .challenge} |
| 136 | +
|
| 137 | +> ## Exercise: Select the raster data within crop fields |
| 138 | +> Let's select out all the crop fields from the raster data, using the `.shp` file in `data/crop_fields` and visualize the results. |
| 139 | +> |
| 140 | +> > ## Solution |
| 141 | +> > |
| 142 | +> > ~~~ |
| 143 | +> > # Load the crop fields polygons |
| 144 | +> > cf_boundary_crop = gpd.read_file("data/crop_fields/cf_boundary_crop.shp") |
| 145 | +> > # Crop |
| 146 | +> > raster_clip_fields = raster_clip.rio.clip(cf_boundary_crop['geometry'], cf_boundary_crop.crs) |
| 147 | +> > # Visualize |
| 148 | +> > raster_clip_fields.plot.imshow(figsize=(8,8)) |
| 149 | +> > ~~~ |
| 150 | +> > {: .language-python} |
| 151 | +> > <img src="../fig/20-crop-raster-crop-fields-solution-06.png" title="Raster cropped by crop fields" width="512" style="display: block; margin: auto;" /> |
| 152 | +> {: .solution} |
| 153 | +{: .challenge} |
| 154 | +
|
| 155 | +
|
| 156 | +## Crop raster data with a geometry buffer |
| 157 | +
|
| 158 | +It is not always the case that the AoI comes in the format of a polygon. Sometimes one would like to perform analysis around a (set of) point(s), or polyline(s). For example, in our AoI, there are also some groundwater monitoring wells coming as point vector data. One may also want to perform analysis around these wells. The location of the wells is stored in `data/groundwater_monitoring_well`. |
| 159 | +
|
| 160 | +~~~ |
| 161 | +# Load wells |
| 162 | +wells = gpd.read_file("data/groundwater_monitoring_well/groundwater_monitoring_well.shp") |
| 163 | +wells = wells.to_crs(raster_clip.rio.crs) |
| 164 | + |
| 165 | +# Plot the wells over raster |
| 166 | +fig, ax = plt.subplots() |
| 167 | +fig.set_size_inches((8,8)) |
| 168 | +raster_clip.plot.imshow(ax=ax) |
| 169 | +wells.plot(ax=ax, color='red', markersize=2) |
| 170 | +~~~ |
| 171 | +{: .language-python} |
| 172 | +
|
| 173 | +<img src="../fig/20-crop-raster-wells-04.png" title="Ground weter level wells" width="512" style="display: block; margin: auto;" /> |
| 174 | +
|
| 175 | +To select pixels around the geometries, one needs to first define a region including the geometries. This region is called a "buffer" and it is defined in the units of the projection. The size of the buffer depends on the analysis in your research. A buffer is also a polygon, which can be used to crop the raster data. The package `geopandas` has a `buffer` function to make buffer polygons. |
| 176 | +
|
| 177 | +~~~ |
| 178 | +# Create 200m buffer around the wells |
| 179 | +wells_buffer = wells.buffer(200) |
| 180 | + |
| 181 | +# Crop |
| 182 | +raster_clip_wells = raster_clip.rio.clip(wells_buffer, wells_buffer.crs) |
| 183 | + |
| 184 | +# Visualize buffer on raster |
| 185 | +fig, (ax1, ax2) = plt.subplots(1, 2) |
| 186 | +fig.set_size_inches((16,8)) |
| 187 | +raster_clip.plot.imshow(ax=ax1) |
| 188 | +wells_buffer.plot(ax=ax1, color='red') |
| 189 | + |
| 190 | +# Visualize cropped buffer |
| 191 | +raster_clip_wells.plot.imshow(ax=ax2) |
| 192 | +~~~ |
| 193 | +{: .language-python} |
| 194 | +
|
| 195 | +<img src="../fig/20-crop-raster-crop-by-well-buffers-05.png" title="Raster croped by buffer around wells" width="1024" style="display: block; margin: auto;" /> |
| 196 | +
|
| 197 | +> ## Exercise: Select the raster data around the dike |
| 198 | +> The dikes are stored as polylines in `.shp` file in `data/dikes`. Let's select out all the raster data within 100m around the dikes and visualize the results. |
| 199 | +> |
| 200 | +> > ## Solution |
| 201 | +> > ~~~ |
| 202 | +> > # Load dikes polyline |
| 203 | +> > dikes = gpd.read_file("data/dikes/dikes.shp") |
| 204 | +> > dikes = dikes.to_crs(raster.rio.crs) |
| 205 | +> > # Dike buffer |
| 206 | +> > dikes_buffer = dikes.buffer(100) |
| 207 | +> > # Crop |
| 208 | +> > raster_clip_dikes = raster_clip.rio.clip(dikes_buffer, dikes_buffer.crs) |
| 209 | +> > # Visualize |
| 210 | +> > raster_clip_dikes.plot.imshow(figsize=(8,8)) |
| 211 | +> > ~~~ |
| 212 | +> > {: .language-python} |
| 213 | +> > <img src="../fig/20-crop-raster-dike-solution-07.png" title="Raster croped by buffer around dikes" width="512" style="display: block; margin: auto;" /> |
| 214 | +> {: .solution} |
| 215 | +{: .challenge} |
0 commit comments