Skip to content

Commit 010e8dc

Browse files
committed
source commit: 699a129
0 parents  commit 010e8dc

File tree

95 files changed

+6494
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+6494
-0
lines changed

01-intro-raster-data.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
title: "Introduction to Raster Data"
3+
teaching: 15
4+
exercises: 5
5+
---
6+
7+
:::questions
8+
- What format should I use to represent my data?
9+
- What are the main data types used for representing geospatial data?
10+
- What are the main attributes of raster data?
11+
:::
12+
13+
:::objectives
14+
- Describe the difference between raster and vector data.
15+
- Describe the strengths and weaknesses of storing data in raster format.
16+
- Distinguish between continuous and categorical raster data and identify types of datasets that would be stored in each format.
17+
:::
18+
19+
## Introduction
20+
21+
This episode introduces the two primary types of geospatial
22+
data: rasters and vectors. After briefly introducing these
23+
data types, this episode focuses on raster data, describing
24+
some major features and types of raster data.
25+
26+
## Data Structures: Raster and Vector
27+
28+
The two primary types of geospatial data are raster
29+
and vector data. Raster data is stored as a grid of values which are rendered on a
30+
map as pixels. Each pixel value represents an area on the Earth's surface. Vector data structures represent specific features on the
31+
Earth's surface, and
32+
assign attributes to those features. Vector data structures
33+
will be discussed in more detail in [the next episode](02-intro-vector-data.md).
34+
35+
This workshop will focus on how to work with both raster and vector
36+
data sets, therefore it is essential that we understand the
37+
basic structures of these types of data and the types of data
38+
that they can be used to represent.
39+
40+
### About Raster Data
41+
42+
Raster data is any pixelated (or gridded) data where each pixel is associated
43+
with a specific geographic location. The value of a pixel can be
44+
continuous (e.g. elevation) or categorical (e.g. land use). If this sounds
45+
familiar, it is because this data structure is very common: it's how
46+
we represent any digital image. A geospatial raster is only different
47+
from a digital photo in that it is accompanied by spatial information
48+
that connects the data to a particular location. This includes the
49+
raster's extent and cell size, the number of rows and columns, and
50+
its coordinate reference system (or CRS).
51+
52+
![Raster Concept (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_concept.png){alt="raster concept"}
53+
54+
Some examples of continuous rasters include:
55+
56+
1. Precipitation maps.
57+
2. Maps of tree height derived from LiDAR data.
58+
3. Elevation values for a region.
59+
60+
A map of elevation for Harvard Forest derived from the [NEON AOP LiDAR sensor](https://www.neonscience.org/data-collection/airborne-remote-sensing)
61+
is below. Elevation is represented as a continuous numeric variable in this map. The legend
62+
shows the continuous range of values in the data from around 300 to 420 meters.
63+
64+
![Continuous Elevation Map: HARV Field Site](fig/E01/continuous-elevation-HARV-plot-01.png){alt="elevation Harvard forest"}
65+
66+
Some rasters contain categorical data where each pixel represents a discrete
67+
class such as a landcover type (e.g., "forest" or "grassland") rather than a
68+
continuous value such as elevation or temperature. Some examples of classified
69+
maps include:
70+
71+
1. Landcover / land-use maps.
72+
2. Tree height maps classified as short, medium, and tall trees.
73+
3. Elevation maps classified as low, medium, and high elevation.
74+
75+
![USA landcover classification](fig/E01/USA_landcover_classification.png){alt="USA landcover classification"}
76+
77+
The map above shows the contiguous United States with landcover as categorical
78+
data. Each color is a different landcover category. (Source: Homer, C.G., et
79+
al., 2015, Completion of the 2011 National Land Cover Database for the
80+
conterminous United States-Representing a decade of land cover change
81+
information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p.
82+
345-354)
83+
84+
:::challenge
85+
## Advantages and Disadvantages
86+
87+
With your neighbor, brainstorm potential advantages and
88+
disadvantages of storing data in raster format. Add your
89+
ideas to the Etherpad. The Instructor will discuss and
90+
add any points that weren't brought up in the small group
91+
discussions.
92+
93+
::::solution
94+
## Solution
95+
96+
Raster data has some important advantages:
97+
98+
* representation of continuous surfaces
99+
* potentially very high levels of detail
100+
* data is 'unweighted' across its extent - the geometry doesn't
101+
implicitly highlight features
102+
* cell-by-cell calculations can be very fast and efficient
103+
104+
The downsides of raster data are:
105+
106+
* very large file sizes as cell size gets smaller
107+
* currently popular formats don't embed metadata well (more on this later!)
108+
* can be difficult to represent complex information
109+
::::
110+
:::
111+
112+
### Important Attributes of Raster Data
113+
114+
#### Extent
115+
116+
The spatial extent is the geographic area that the raster data covers.
117+
The spatial extent of an object represents the geographic edge or
118+
location that is the furthest north, south, east and west. In other words, extent
119+
represents the overall geographic coverage of the spatial object.
120+
121+
![Spatial extent image (Image Source: National Ecological Observatory Network (NEON))](fig/E01/spatial_extent.png){alt="spatial extent objects"}
122+
123+
:::challenge
124+
## Extent Challenge
125+
126+
In the image above, the dashed boxes around each set of objects
127+
seems to imply that the three objects have the same extent. Is this
128+
accurate? If not, which object(s) have a different extent?
129+
130+
::::solution
131+
## Solution
132+
133+
The lines and polygon objects have the same extent. The extent for
134+
the points object is smaller in the vertical direction than the
135+
other two because there are no points on the line at y = 8.
136+
::::
137+
:::
138+
139+
#### Resolution
140+
141+
A resolution of a raster represents the area on the ground that each
142+
pixel of the raster covers. The image below illustrates the effect
143+
of changes in resolution.
144+
145+
![Resolution image (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_resolution.png){alt="resolution image"}
146+
147+
### Raster Data Format for this Workshop
148+
149+
Raster data can come in many different formats. For this workshop, we will use
150+
the GeoTIFF format which has the extension `.tif`. A `.tif` file stores metadata
151+
or attributes about the file as embedded `tif tags`. For instance, your camera
152+
might store a tag that describes the make and model of the camera or the date
153+
the photo was taken when it saves a `.tif`. A GeoTIFF is a standard `.tif` image
154+
format with additional spatial (georeferencing) information embedded in the file
155+
as tags. These tags should include the following raster metadata:
156+
157+
1. Extent
158+
2. Resolution
159+
3. Coordinate Reference System (CRS) - we will introduce this concept in [a later episode](03-crs.md)
160+
4. Values that represent missing data (`NoDataValue`) - we will introduce this
161+
concept in [a later episode](06-raster-intro.md).
162+
163+
We will discuss these attributes in more detail in [a later episode](06-raster-intro.md).
164+
In that episode, we will also learn how to use Python to extract raster attributes
165+
from a GeoTIFF file.
166+
167+
:::callout
168+
## More Resources on the `.tif` format
169+
170+
* [GeoTIFF on Wikipedia](https://en.wikipedia.org/wiki/GeoTIFF)
171+
* [OSGEO TIFF documentation](https://trac.osgeo.org/geotiff/)
172+
:::
173+
174+
### Multi-band Raster Data
175+
176+
A raster can contain one or more bands. One type of multi-band raster
177+
dataset that is familiar to many of us is a color
178+
image. A basic color image consists of three bands: red, green, and blue.
179+
Each
180+
band represents light reflected from the red, green or blue portions of
181+
the
182+
electromagnetic spectrum. The pixel brightness for each band, when
183+
composited
184+
creates the colors that we see in an image.
185+
186+
![RGB multi-band raster image (Source: National Ecological Observatory Network (NEON).)](fig/E01/RGBSTack_1.jpg){alt="multi-band raster"}
187+
188+
We can plot each band of a multi-band image individually.
189+
190+
Or we can composite all three bands together to make a color image.
191+
192+
In a multi-band dataset, the rasters will always have the same extent,
193+
resolution, and CRS.
194+
195+
:::callout
196+
## Other Types of Multi-band Raster Data
197+
198+
Multi-band raster data might also contain:
199+
1. **Time series:** the same variable, over the same area, over time.
200+
2. **Multi or hyperspectral imagery:** image rasters that have 4 or
201+
more (multi-spectral) or more than 10-15 (hyperspectral) bands. We
202+
won't be working with this type of data in this workshop, but you can
203+
check out the NEON Data Skills [Imaging Spectroscopy HDF5 in R](https://www.neonscience.org/hsi-hdf5-r)
204+
tutorial if you're interested in working with hyperspectral data cubes.
205+
:::
206+
207+
:::keypoints
208+
- Raster data is pixelated data where each pixel is associated with a specific location.
209+
- Raster data always has an extent and a resolution.
210+
- The extent is the geographical area covered by a raster.
211+
- The resolution is the area covered by each pixel of a raster.
212+
:::

02-intro-vector-data.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: "Introduction to Vector Data"
3+
teaching: 10
4+
exercises: 5
5+
---
6+
7+
:::questions
8+
- What are the main attributes of vector data?
9+
:::
10+
11+
:::objectives
12+
- Describe the strengths and weaknesses of storing data in vector format.
13+
- Describe the three types of vectors and identify types of data that would be stored in each.
14+
:::
15+
16+
## About Vector Data
17+
18+
Vector data structures represent specific features on the Earth's surface, and
19+
assign attributes to those features. Vectors are composed of discrete geometric
20+
locations (x, y values) known as vertices that define the shape of the spatial
21+
object. The organization of the vertices determines the type of vector that we
22+
are working with: point, line or polygon.
23+
24+
![Types of vector objects (Image Source: National Ecological Observatory Network (NEON))](fig/E02/pnt_line_poly.png){alt="vector data types"}
25+
26+
* **Points:** Each point is defined by a single x, y coordinate. There can be
27+
many points in a vector point file. Examples of point data include: sampling
28+
locations, the location of individual trees, or the location of survey plots.
29+
30+
* **Lines:** Lines are composed of many (at least 2) points that are connected.
31+
For instance, a road or a stream may be represented by a line. This line is
32+
composed of a series of segments, each "bend" in the road or stream represents a
33+
vertex that has a defined x, y location.
34+
35+
* **Polygons:** A polygon consists of 3 or more vertices that are connected and
36+
closed. The outlines of survey plot boundaries, lakes, oceans, and states or
37+
countries are often represented by polygons.
38+
39+
:::callout
40+
## Data Tip
41+
42+
Sometimes, boundary layers such as states and countries, are stored as lines
43+
rather than polygons. However, these boundaries, when represented as a line,
44+
will not create a closed object with a defined area that can be filled.
45+
:::
46+
47+
:::challenge
48+
## Identify Vector Types
49+
50+
The plot below includes examples of two of the three types of vector
51+
objects. Use the definitions above to identify which features
52+
are represented by which vector type.
53+
54+
![Vector Type Examples](fig/E02/vector_types_examples.png){alt="vector type examples"}
55+
56+
::::solution
57+
## Solution
58+
59+
State boundaries are polygons. The Fisher Tower location is
60+
a point. There are no line features shown.
61+
::::
62+
:::
63+
64+
Vector data has some important advantages:
65+
66+
* The geometry itself contains information about what the dataset creator thought was important
67+
* The geometry structures hold information in themselves - why choose point over polygon, for instance?
68+
* Each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc
69+
* Data storage can be very efficient compared to rasters
70+
71+
The downsides of vector data include:
72+
73+
* Potential loss of detail compared to raster
74+
* Potential bias in datasets - what didn't get recorded?
75+
* Calculations involving multiple vector layers need to do math on the
76+
geometry as well as the attributes, so can be slow compared to raster math.
77+
78+
Vector datasets are in use in many industries besides geospatial fields. For
79+
instance, computer graphics are largely vector-based, although the data
80+
structures in use tend to join points using arcs and complex curves rather than
81+
straight lines. Computer-aided design (CAD) is also vector- based. The
82+
difference is that geospatial datasets are accompanied by information tying
83+
their features to real-world locations.
84+
85+
## Vector Data Format for this Workshop
86+
87+
Like raster data, vector data can also come in many different formats. For this
88+
workshop, we will use the Shapefile format. A Shapefile format consists of multiple
89+
files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.
90+
91+
- The `.shp` file stores the feature geometry itself
92+
- `.shx` is a positional index of the feature geometry to allow quickly searching forwards and backwards the geographic coordinates of each vertex in the vector
93+
- `.dbf` contains the tabular attributes for each shape.
94+
- `.prj` file indicates the Coordinate reference system (CRS)
95+
- `.shp.xml` contains the Shapefile metadata.
96+
97+
Together, the Shapefile includes the following information:
98+
99+
* **Extent** - the spatial extent of the shapefile (i.e. geographic area that
100+
the shapefile covers). The spatial extent for a shapefile represents the
101+
combined extent for all spatial objects in the shapefile.
102+
* **Object type** - whether the shapefile includes points, lines, or polygons.
103+
* **Coordinate reference system (CRS)**
104+
* **Other attributes** - for example, a line shapefile that contains the
105+
locations of streams, might contain the name of each stream.
106+
107+
Because the structure of points, lines, and polygons are different, each
108+
individual shapefile can only contain one vector type (all points, all lines
109+
or all polygons). You will not find a mixture of point, line and polygon
110+
objects in a single shapefile.
111+
112+
:::callout
113+
## More Resources on Shapefiles
114+
115+
More about shapefiles can be found on
116+
[Wikipedia.](https://en.wikipedia.org/wiki/Shapefile) Shapefiles are often publicly
117+
available from government services, such as [this page from the US Census Bureau][us-cb] or
118+
[this one from Australia's Data.gov.au website](https://data.gov.au/data/dataset?res_format=SHP).
119+
:::
120+
121+
:::callout
122+
## Why not both?
123+
124+
Very few formats can contain both raster and vector data - in fact, most are
125+
even more restrictive than that. Vector datasets are usually locked to one
126+
geometry type, e.g. points only. Raster datasets can usually only encode one
127+
data type, for example you can't have a multiband GeoTIFF where one layer is
128+
integer data and another is floating-point. There are sound reasons for this -
129+
format standards are easier to define and maintain, and so is metadata. The
130+
effects of particular data manipulations are more predictable if you are
131+
confident that all of your input data has the same characteristics.
132+
:::
133+
134+
[us-cb]: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
135+
136+
:::keypoints
137+
- Vector data structures represent specific features on the Earth's surface along with attributes of those features.
138+
- Vector objects are either points, lines, or polygons.
139+
:::

0 commit comments

Comments
 (0)