tmap/geodata.qmd at main · geocompx/tmap · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
# Spatial data in R {#sec-geodata}

\index{spatial data}

## Introduction {#sec-intro-geodata}

Vector and raster data models are two basic models used to represent spatial data.
Vectors are used to store discrete spatial objects, such as points, lines, and polygons, while rasters are used to represent continuous surfaces, such as elevation or temperature.
Each data model has its own mapping capabilities and limitations, and they are used in different contexts.
This chapter starts by describing these two popular spatial data models, and their extensions, spatial vector and raster data cubes (@sec-data-models).
Each data model is introduced, explained how it is built, and how it is stored using different file formats.
Next, this chapter presents how these different data models are implemented in R (@sec-spatial-data-representations-in-r).
It includes showing how to read different spatial data formats, how to understand spatial R objects, and where to find more information about preprocessing spatial data.
<!-- explain that often there is a need to prepare spatial data before mapping ?? -->

## Data models {#sec-data-models}

Traditionally, spatial data has been described by two basic data models: vector data model aimed at (@sec-vector-data-model) representing the world using points, lines, and polygons, and raster data model focused on representing surfaces (@sec-raster-data-model).
Now, with an abundance of available spatial data and a variety of ways to obtain it, we need to extend these two models to be able to store and operate on more complex data.
It includes having many distinct variables and various repeated measurements for the same area that are often collected for different moments in time.
Therefore, we also use the concept of spatial data cubes, which can be applied to both vector and raster data (@sec-spatial-data-cubes).

### Vector data model {#sec-vector-data-model}

\index{vector data model}
\index{spatial geometries}
\index{spatial attributes}
The vector data model represent the world as a set of spatial geometries with non-spatial attributes (@fig-vector-data-model).
The role of geometry is to describe the location and shape of spatial objects, while attributes are used to store properties of these objects.

\index{spatial geometries}
There are three basic types of spatial geometries: points, lines, and polygons, all of them are made up of coordinates (left part of @fig-vector-data-model).
A point is represented by a pair of coordinates, usually described as X and Y, allowing for locating this point in some space.
X and Y could be unitless, in degrees, or in some measure units, such as meters (extended discussion on coordinates and related topics is in @sec-map-projections).
Points can represent features on different spatial scales, from a GPS position, location of a building in a city, to a city itself on a small scale map.
They are also used to express abstract features, such as locations of map labels.
Properties of points, such as the type of the building or the population of the city, can be expressed on maps by different point sizes, colors, or shapes.

A line extends the idea of a point.
Internally, it consists of several points with coordinates, called vertices, that are arranged in some order.
Consecutive points are connected by straight lines.
Therefore, a straight spatial line consists of two points (two pairs of coordinates), while complex spatial lines could be created based on a large number of points giving the illusion of a curved line.
Lines are used to representing linear features, such as roads, rivers, boundaries, footpaths, etc.
In this case, we can express line features' attributes using either lines' color, their widths, or line patterns (e.g., dashed or dotted lines).

A polygon is again a set of ordered points (*vertices*) connected by straight lines.
Its only difference from the line is that the first and the last point in a polygon has the same coordinates, and thus close the object.
The polygon representation is used to represent shapes and locations of different objects, from a lake or a patch of vegetation, through a building or a city block, to some administrative units.
Polygons also have one unique feature -- they could have holes: a polygon hole represents an area inside of the polygon but does not belong to it.
For example, a lake with an island can be depicted as a polygon with a hole.
The values of polygons' attributes can be represented, for example, by the areas colors (*fill*).

\index{spatial attributes}
The second part of the vector data model relates to non-spatial attributes (right part of @fig-vector-data-model).
Attributes are usually stored as a table, in which each column depicts some property, such as an identification number, a name of a feature, or a value of some characteristic.
Each row, on the other hand, relates to a single spatial geometry.

```{r}
#| echo: false
#| message: false
library(sf)
library(tmap)
point_data = st_sf(
    id = as.factor(c(1, 2)),
    type = c("streetlight", "bench"),
    geom = st_sfc(
        st_multipoint(rbind(c(1, 2))),
        st_multipoint(rbind(c(1.4, 1.2), c(2, 1.5))),
        crs = "EPSG:4326"
    )
)
```

```{r}
#| echo: false
linestring_sfg1 = st_linestring(rbind(c(0.9, 1.9), c(1.6, 1.9), c(1.8, 1.8)))
linestring_sfg2 = st_linestring(rbind(c(0.8, 1.6), c(1.4, 1.5), c(2, 1.6)))
linestring_sfg3 = st_linestring(rbind(c(1.1, 1), c(1.6, 1.2), c(1.9, 1.2)))
lines_data = st_sf(
    id = as.factor(1:3),
    type = c("a", "b", "c"),
    geom = st_sfc(linestring_sfg1,
        linestring_sfg2,
        linestring_sfg3,
        crs = "EPSG:4326"
    )
)
lines_data_p = st_cast(lines_data, "MULTIPOINT")
```

```{r}
#| echo: false
polygon_sfg1 = st_polygon(list(rbind(
    c(1.4, 1.6),
    c(1.6, 1.6),
    c(1.6, 1.8),
    c(1.4, 1.8),
    c(1.4, 1.6)
)))

polygon_sfg2 = st_multipolygon(list(
    list(rbind(
        c(1.6, 1.3), c(1.8, 1.45), c(1.6, 1.45), c(1.6, 1.3)
    )),
    st_polygon(
        list(
            rbind(
                c(0.9, 1.1), c(1.2, 1.1), c(1.2, 1.4), c(0.9, 1.4), c(0.9, 1.1)
            ),
            rbind(
                c(1.0, 1.2), c(1.1, 1.3), c(1.1, 1.2), c(1.0, 1.2)
            )
        )
    )
))

polygon_data = st_sf(
    id = as.factor(1:2),
    type = c("a", "b"),
    geom = st_sfc(polygon_sfg1,
        polygon_sfg2,
        crs = "EPSG:4326"
    )
)

polygon_data_p = st_cast(polygon_data, "MULTIPOINT")
```

```{r}
#| label: fig-vector-data-model
#| echo: false
#| message: false
#| warning: false
#| fig-width: 5
#| fig-asp: 1
#| fig-cap: "Instances of spatial vector data model: POINTS, LINES, and POLYGONS."
source("code/data_model_figures.R")
draw_vector_data(scale = 0.75)
```

\index{simple feature}
The above ideas could be implemented in many ways.
Currently, [the Simple Feature Access](http://portal.opengeospatial.org/files/?artifact_id=25355) seems to be the most widely used standard.
In it, a feature is every object or concept that have spatial location or extent.

Simple feature standard makes a clear distinction between single- and multi-element features.
We can have a POINT feature and a MULTIPOINT feature, and similarly LINESTRING and MULTILINESTRING, and POLYGON and MULTIPOLYGON.
The main difference between single element features (such as POINT or POLYGON) and multi-element features (such as MULTIPOINT or MULTIPOLYGON) can be clearly seen by looking at attribute tables.
For example, six points stored as POINT features fill six separate rows, while six points stored as just one MULTIPOINT feature occupy just one row.

Examples of single- and multi-element features can be seen in @fig-vector-data-model.
The top example shows point data represented as MULTIPOINT feature: although we have seven points (seven distinct pairs of coordinates), they are gathered into two groups, green and orange, which can be seen in the associated attribute table.
The central example, on the other hand, uses single-element features, where each line geometry relates to one row in the attribute table.
Finally, the bottom example again uses multi-element features, where the second feature (`Country B`) consist of two separate geometries.
The simple feature standard also describes a number of additional geometry types, including Curve, Surface, or Triangle.
Finally, GeometryCollection exists that contains all of the possible geometry types.

\index{spatial file formats}
Many dozens of file formats exist to store spatial vector data.
One of the simplest ways to store spatial data is in the form of a text file (`.csv`) or as a spreadsheet (`.xls` or `.xlsx`).
While it makes storing point data simple, with two columns representing coordinates, it is not easy to store more complex objects in this way.
Text files are also not suitable for storing information about the coordinate reference system used (@sec-map-projections).
Historically, the shapefile format (`.shp`) developed by the ESRI company gained a lot of interest and become the most widely supported spatial vector file format.
Despite its popularity, this format has a number of shortcomings, including the need to store several files, attribute names limited to ten characters, the ability to store up to 255 attributes and files up to 2GB, and many more.
A fairly recent file format, OGC GeoPackage (`.gpkg`), was developed as an alternative.
It is a single file database free from the limitation of the shapefile format.
Other popular spatial vector file formats include GeoJSON (`.geojson`), GPX (`.gpx`), and KML (`.kml`).
<!-- FlatGeobuf??, etc.? -->

### Raster data model {#sec-raster-data-model}

\index{raster data model}
The raster data model represents the world using a continuous grid of cells (*pixels*), where each cell has a single associated value (@fig-raster-intro).
Depending on the type of values, we can distinguish continuous and categorical rasters.
In continuous rasters, such as elevation or precipitation, values vary progressively, while categorical rasters use integer values to represent classes (e.g., land cover or soil types maps).
Raster data can also contain cells for which we do not know the value (@fig-raster-intro) -- for example, data for this part of the area was not collected, or these locations are outside of our area of interest.

```{r}
#| label: fig-raster-intro
#| echo: false
#| warning: false
#| message: false
#| #fig-asp: 0.4
#| layout-ncol: 3
#| fig-cap: "Basic representations of the raster data model."
#| fig-subcap:
#|   - Cell IDs
#|   - Cell values
#|   - A raster map
# replace example later
library(tmap)
library(spData)
library(stars)
set.seed(2020 - 06 - 25)
mat = matrix(1:20, nrow = 5, ncol = 4)
dim(mat) = c(x = 5, y = 4)
mat2 = matrix(sample.int(100, 20), nrow = 5, ncol = 4)
mat2[c(4, 12)] = NA
dim(mat2) = c(x = 5, y = 4)
sta = c(st_as_stars(mat), st_as_stars(mat2))
attr(sta, "dimensions")[[2]]$delta = -1

sta_sf = st_as_sf(sta)
sta_sf$A1.2 = as.character(as.factor(sta_sf$A1.1))
sta_sf$A1.2[is.na(sta_sf$A1.1)] = "NA"

tm_shape(sta_sf) +
    tm_polygons() +
    tm_text("A1", size = 2.5) +
    tm_layout(frame = FALSE, outer.margins = FALSE)

tm_shape(sta_sf) +
    tm_polygons("A1.1") +
    tm_text("A1.2", size = 2.5) +
    tm_layout(frame = FALSE, legend.show = FALSE, outer.margins = FALSE)

tm_shape(sta) +
    tm_raster("A1.1") +
    tm_layout(frame = FALSE, legend.show = FALSE,
              inner.margins = 0.02, outer.margins = FALSE)
```

\index{raster data grid types}
When we think about raster data, most of the time we are referring to regular grids (@fig-grid-types).
In regular grids, each cell has the same, constant size, and coordinates change from top to bottom and from left to right^[Regular grids can also have coordinated changing in different directions, e.g., from bottom to top.].
<!-- I know it is a simplification-->
Regular rasters can be transformed into rotated and sheared rasters (@fig-grid-types).
Rotated grids are the result of transforming both coordinated, $x$ and $y$ using the same rotation coefficients.
Sheared grids are created when the rotation coefficients are not equal.
Rectilinear grids, on the other hand, have orthogonal axes, but consist of rectangular cells with different sizes and shapes (@fig-grid-types).
In the last type of raster data grids, curvilinear grids, cells are cuboids of different sizes and shapes (@fig-grid-types).

```{r}
#| label: fig-grid-types
#| echo: false
#| warning: false
#| #fig-asp: 0.4
#| layout-ncol: 5
#| fig-cap: "Main types of raster data grids."
#| fig-subcap:
#|   - Regular
#|   - Rotated
#|   - Sheared
#|   - Rectilinear
#|   - Curvilinear
# regular grid
tm_sta_regular = tm_shape(sta_sf) +
    tm_polygons() +
    tm_text("A1", size = 2.5) +
    # tm_title("Regular") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
# rotated grids
sta_rotated = sta
attr(attr(sta_rotated, "dimensions"), "raster")$affine = c(0.1, 0.1)
sta_rotated_sf = st_as_sf(sta_rotated)
tm_sta_rotated = tm_shape(sta_rotated_sf) +
    tm_polygons() +
    tm_text("A1", size = 2.5) +
    # tm_title("Rotated") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
# sheared grids
sta_sheared = sta
attr(attr(sta_sheared, "dimensions"), "raster")$affine = c(0.1, 0.2)
sta_sheared_sf = st_as_sf(sta_sheared)
tm_sta_sheared = tm_shape(sta_sheared_sf) +
    tm_polygons() +
    tm_text("A1", size = 2.5) +
    # tm_title("Sheared") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
# rectilinear grids
x = c(0, 0.5, 1.5, 2.1, 3, 5)
y = rev(c(0.1, 1, 1.5, 2, 4))
sta_rectilinear = st_as_stars(list(m = mat),
    dimensions = st_dimensions(x = x, y = y)
)
sta_rectilinear_sf = st_as_sf(sta_rectilinear)
tm_sta_rectilinear = tm_shape(sta_rectilinear_sf) +
    tm_polygons() +
    tm_text("m", size = 2.5) +
    # tm_title("Rectilinear") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
# curvilinear grids
sta_curvilinear = sta
X1 = matrix(rep(1:5, times = 4), nrow = 5, ncol = 4)
X2 = matrix(
    c(
        seq(3.36, 1, length.out = 4),
        seq(3.52, 1.16, length.out = 4),
        seq(3.68, 1.32, length.out = 4),
        seq(3.68, 1.32, length.out = 4),
        seq(3.68, 1.32, length.out = 4)
    ),
    nrow = 5, ncol = 4,
    byrow = TRUE
)
am = matrix(1:20, nrow = 5, ncol = 4)
sta_curvilinear = st_as_stars(am)
sta_curvilinear = st_as_stars(sta_curvilinear,
    curvilinear = list(
        X1 = X1,
        X2 = X2
    )
)
sta_curvilinear_sf = st_as_sf(sta_curvilinear)
tm_sta_curvilinear = tm_shape(sta_curvilinear_sf) +
    tm_polygons() +
    tm_text("A1", size = 2.5) +
    # tm_grid() +
    # tm_title("Curvilinear") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
# all
tm_sta_regular
tm_sta_rotated
tm_sta_sheared
tm_sta_rectilinear
tm_sta_curvilinear
```

Contrary to spatial vector data, a basic raster data stores just one attribute.
It is, however, possible to stack together many single rasters (also known as *raster layers*).
This allows us to store and operate at the same time on many rasters having the same dimensions.
Examples of multi-layer rasters include satellite imageries or time-series rasters.
Satellite imageries usually consist of many bands (layers) for different wavelengths.
The most basic bands, representing the colors red, green, and blue, can be connected together to create one composite image with true colors (@fig-rgb-raster).
Time-series rasters store one attribute, but for many moments in time.
Additional information about multi-layer rasters can be also found in @sec-spatial-data-cubes.

```{r}
#| label: fig-rgb-raster
#| echo: false
#| message: false
#| warning: false
#| fig-asp: 0.4
#| layout-ncol: 4
#| fig-cap: "Example of three satellite imagery bands: red, green, blue, and the composite
#|   image with true colors created using these three bands."
#| fig-subcap:
#|   - Red
#|   - Green
#|   - Blue
#|   - Composite
# we can/should change this example later
landsat234 = terra::rast(system.file("raster/landsat.tif", package = "spDataLarge"))[[1:3]]
tm_shape(landsat234[[3]]) +
    tm_raster(col.scale = tm_scale(values = "Greys")) +
    # tm_title("Red") +
    tm_layout(frame = FALSE, legend.show = FALSE, outer.margins = FALSE)
tm_shape(landsat234[[2]]) +
    tm_raster(col.scale = tm_scale(values = "Greys")) +
    # tm_title("Green") +
    tm_layout(frame = FALSE, legend.show = FALSE, outer.margins = FALSE)
tm_shape(landsat234[[1]]) +
    tm_raster(col.scale = tm_scale(values = "Greys")) +
    # tm_title("Blue") +
    tm_layout(frame = FALSE, legend.show = FALSE, outer.margins = FALSE)
tm_shape(landsat234) +
    tm_rgb(tm_vars(names(landsat234)[3:1], multivariate = TRUE)) +
    # tm_title("Composite") +
    tm_layout(frame = FALSE, outer.margins = FALSE)
```

\index{spatial file formats}
Similarly to vector data, a large number of raster file formats exists.
Currently, the GeoTIFF format (`.tif` or `.tiff`) is the most popular spatial raster formats.
It is an extended image TIF format that stores spatial metadata (e.g., map projection) along the values.
Another popular spatial raster formats include Arc ASCII (`.asc`) and ERDAS Imagine (`.img`).
<!-- ncdf??? -->

### Spatial data cubes {#sec-spatial-data-cubes}

\index{spatial data cubes}
Traditionally, spatial vector and raster data models refer to a unique set of locations.
For example, each feature in a polygon dataset and each cell in a raster dataset refer to one specific area.
However, to solve real-life problems, we need to store and operate on more complex data structures.
It includes situations when we have many attributes, often for several moments in time.

\index{spatial vector data cubes}
Storing multiple attributes is not a problem for the vector data model, when an attribute table can have many columns.
The question is how to extend the spatial vector data model to include measurements for many times.
For example, let's consider a polygon data with many attributes representing shares of land-use types for several years (@fig-vector-data-cubes).
One approach would be to create a separate column for each variable in each year.
Alternatively, we can have one column representing the year and one column for each attribute, however, this approach would require multiplying each geometry as many times as we have time stamps.
The third approach involves separating geometries from attributes, and where attributes for each moment are stored independently.
The last idea is used in spatial vector data cubes (@sec-the-stars-package).
An example of the spatial vector data cubes idea can be seen in @fig-vector-data-cubes.
It consists of two elements: a geometry (MULTIPOLYGON) of provinces of the Netherlands and an array connected to it that stores shares of land-use types for several years.

```{r}
#| label: fig-vector-data-cubes
#| message: false
#| warning: false
#| echo: false
#| fig-width: 6
#| fig-asp: 0.60606
#| fig-cap: Vector data cube.
source("code/data_model_figures.R")
draw_vector_cubes()
```

\index{spatial raster data cubes}
A single raster dataset can store just one variable for a given area.
To store several attributes, we can connect rasters representing different attributes for the same extent, creating multi-layer rasters (@sec-raster-data-model).
Additionally, each of the aforementioned rasters can be collected for many moments in time, adding other layers to the data.
The question here is how to efficiently store multi-layer raster data to understand what layers relate to which attribute and time.
Similarly to spatial vector data cubes, we can think of separating spatial dimensions from non-spatial attributes and create spatial raster data cubes (@sec-the-stars-package).
@fig-raster-data-cubes gives an example of a raster data cube.
It consists of several single-layer rasters with the same spatial properties, such as resolution, extent, and CRS.
These rasters are organized to store four-dimensions of the data: latitude, longitude, time, and attributes.
It has values of three attributes for five moments in time in total.

```{r}
#| label: fig-raster-data-cubes
#| message: false
#| echo: false
#| fig-asp: 0.25
#| fig-cap: Raster data cube.
source("code/data_model_figures.R")
draw_data_cubes()
```

Spatial data cubes are suitable for many real-life applications.
For example, time-series of climate measurements for several stations, demographic data on a country level gathered for many years, or satellite imageries over some period of time.

\index{spatial file formats}
One way to create spatial data cubes is by connecting many independent vector or raster objects.
<!-- mention it in the stars section? -->
Second way is to read a spatial data cube from one of the file formats allowing for storing complex data.
It includes formats such as NetCDF (`.nc`) and HDF (`.hdf`).
<!-- spatial vector data cubes file formats? -->

## Spatial data representations in R {#sec-spatial-data-representations-in-r}

\index{raster data model}
\index{spatial data cubes}
\index{vector data model}
In the three next sections, we introduce the **sf** package (@sec-the-sf-package), the **terra** (@sec-the-terra-package), the **stars** package (@sec-the-stars-package).
The **sf** package is used to represent spatial vector data, while the **terra** package represents spatial raster data.
The **stars** package is used to store both spatial vector and raster data cubes.

<!-- spatial data cubes -->
<!-- https://github.com/appelmar/gdalcubes_R -->
<!-- https://ropensci.org/blog/2019/11/05/tidync/ -->

### The sf package {#sec-the-sf-package}

\index{sf}
\index{sf (package)|see {sf}}
The **sf** package implements ideas behind the Simple Feature standard, which describe how to represent spatial vector data.
Its main class, `sf`, has the form of an extended data frame, where each row is a spatial feature.
In it, attributes of the vector data are stored as columns.
It also has one additional column, most often named `geom` or `geometry`^[However, any other names are also possible.].
This column contains geometries in a form of well-known text (WKT), storing all of the coordinates.

The **sf** package can read all of the spatial data formats mentioned in @sec-vector-data-model using the `read_sf()` function^[It is also possible to read spatial vector data using the `st_read()` function, which differs from `read_sf()` by having different default arguments.].
We just need to provide the path to the file with spatial data, which will create a new object.

```{r}
library(sf)
worldvector = read_sf("data/worldvector.gpkg")
```

The new object, `worldvector`, has a `sf` class.
It has `r nrow(worldvector)` features (rows or geometries) and `r ncol(worldvector) - 1` fields (columns with attributes).
There is also an `r ncol(worldvector)`th column, `geom`, that stores geometries of each feature.
Objects of class `sf` also display a header containing spatial metadata.
It includes geometry type, dimension (`XY`, `XYZ`, `XYM`, `XYZM`), bounding box (`bbox`), and information about the used Coordinate Reference System (`CRS`).

```{r}
worldvector
```

The `worldvector` object has MULTIPOLYGON geometry type, where each feature (row) can consist of one or more polygons.
Each polygon's vertices are represented by a pair of values (`dimension: XY`).
Bounding box allows to understand the spatial extension of the input data.
Finally, it has `r ifelse(sf::st_crs(worldvector)$IsGeographic, "geographic", "projected")` CRS named `r sf::st_crs(worldvector)$Name` (you can learn more about Coordinate Reference Systems in @sec-map-projections).

Spatial vector data of class `sf` can be also obtained using other R data packages.
For example, **rnaturalearth** allows to download world map data, **osmdata** imports OpenStreetMap data, and **tigris** loads TIGER/Line data of the US Census Bureau as `sf` objects.
More data packages can be found in the CRAN Task View: Analysis of Spatial Data at <https://cran.r-project.org/view=Spatial>.

<!-- The **tmap** package accepts spatial vector data objects from both **sf** and **sp** packages. -->
<!-- In case of having vector objects in a different representation, they should be converted into `sf` objects first, before making maps. -->
<!-- The **sf** package has the `st_as_sf()` function that translates objects of many classes, including `Spatial` (from the **sp** package), `ppp`, `psp`, and `lpp` (from the **spatstat** package), to the objects of class `sf`. -->
<!-- The `st_as_sf()` function also allows to turn data frames into `sf` objects - the user needs to provide the input data frame, names of columns with coordinates, and additionally definition of the CRS of the data. -->
<!-- For example `my_sf = st_as_sf(my_df, coords = c("Xcolumn", "Ycolumn"), crs = "EPSG:4326")`. -->

If you want to learn more about operating on `sf` objects, we recommend visiting the package website and vignettes at <https://r-spatial.github.io/sf/> and reading [the Geocomputation with R book](https://r.geocompx.org/) [@lovelace_geocomputation_2025].

### The terra package {#sec-the-terra-package}

\index{terra}
\index{terra (package)|see {terra}}
The **terra** package represents and processes spatial raster data in R.
It has many high-performance functions, allowing for efficient transformation and analysis of raster data.

To read raster data, the **terra** package has the `rast()` function^[This function can also be used to create a new raster from scratch].
This way, we create a new object of the class `SpatRaster,` which is the main class for raster data in the **terra** package.

```{r}
#| message: false
library(terra)
worldelevation1 = rast("data/worldelevation.tif")
```

To see the properties of the `worldelevation1` object, we just need to type its name.

```{r}
worldelevation1
```

Now, we can see the number of rows and columns, resolution, extent, coordinate reference system (CRS), and the source and name of the raster layer.
In the case of raster data from a file, the **terra** package uses a proxy approach, where only metadata is read into computer memory, and the actual values are read only when needed and processed in fitting-in-memory chunks.

The **terra** package has many functions for raster data processing, including `merge()`, `aggregate()`, `resample()`, and many more.
It also enables interactions between raster and vector data, including cropping rasters to vector geometries and masking rasters with vector geometries using the `crop()` and `mask()` functions, respectively.
<!-- what else should we add here? -->

Practical examples of using the **terra** package can be found on the Spatial Data Science with R and “terra” website at <https://rspatial.org/> and in the Geocomputation with R book at <https://r.geocompx.org/> [@lovelace_geocomputation_2025].

### The stars package {#sec-the-stars-package}

\index{stars}
\index{stars (package)|see {stars}}
The **stars** package allows for reading and processing raster data in R.
This package also has support for both spatial vector and raster data cubes.
Its main class, `stars`, is built as a list of matrices or arrays with metadata describing their dimensions.
The **stars** package is also well integrated with **sf**, with many `st_` functions (such as `st_crs()`) working also on `stars` objects.

The `read_stars()` function allow to read both simple and multidimensional spatial raster data from a file^[The **stars** package also has a function `read_ncdf()` aimed at improved reading of NetCDF files.].
This function requires at least one argument with a filename to be read.

```{r}
library(stars)
worldelevation2 = read_stars("data/worldelevation.tif")
```

The new object, `worldelevation2`, is of a `stars` class.
It has two dimensions, `x` and `y`, and one attribute `worldelevation.tif`.

```{r}
worldelevation2
```

The `worldelevation.tif` attribute is a matrix, where each cell represents an elevation value.
The `x` dimension has 1080 elements (columns), starting from a coordinate (`offset`) of a cell boundary of `-180`.
Next, the coordinates of further cells increase by `0.333333` (`delta`) -- resolution in the `x` dimension.
The `y` dimension has 540 elements (rows), starting from a coordinate (`offset`) of a cell boundary of `90`.
For the `y` dimension, each further cell's coordinated decreases by `0.333333` (notice the negative value of `delta`) -- resolution in the `y` dimension.
Both dimensions also have the same CRS -- `WGS 84`.

`read_stars()` also has several additional arguments including `RasterIO`, which gives control over the input data extent and resolution.
For example, the below code will read just the first and second bands -- in this case, the average monthly temperatures for January and February in Slovenia (results not shown).
<!-- - including reading chunks, changing resolution, and selecting bands -->

```{r}
slo_tavg_fp = "data/slovenia/slo_tavg.tif"
slo_tavg12 = read_stars(slo_tavg_fp, RasterIO = list(bands = c(1, 2)))
```

Internally, a `stars` object is a list of `matrix` or `array` objects with additional attributes describing spatial metadata, such as a number of columns and rows, resolution, coordinate reference system, etc.
All of this information is read from the input file.

Stars objects are constructed by dimensions and attributes.
Dimensions relate to what kind of objects are stored as list elements.
For example, when it is a `matrix` then we just have two dimensions representing columns and rows.
However, it is also possible to store multidimensional `array`s, which allow having many additional dimensions for bands, times, etc.
Attributes, on the other hand, are stored as list elements.
Each attribute can relate, for example to a different variable.

Reading a simple GeoTIFF file would result in having just two dimensions and one attribute (a `matrix`).
On the other hand, reading complex raster file formats, such as NetCDF could result in having more than two dimensions (e.g. time) and many attributes (e.g., an `array` with temperature, precipitation, humidity).
<!-- how it relates to mapping? -->

<!-- - stars proxy -->
<!-- more than 1e8 cells to read -->
<!-- Before reading the file, the **stars** package checks if the input data is a curvilinear grid and what is the number of cells in the data. -->
<!-- When the input data is small or curvilinear then the full data is read in computer memory. -->
<!-- Otherwise, a `stars proxy` approach is used, where only metadata is read including pointers to where the complete data is. -->
<!-- When we want to plot large raster data, then it is read at a lower resolution than the native one. -->
<!-- ref to the section where we are explaining max.plot options -->

The **stars** package also has support for vector data cubes, where each geometry is just stored once (as a dimension), and each attribute is a `matrix` or an `array` with the number of rows equals to the number of geometries, the number of columns equals to another dimension (e.g., time), and possibly the number of `array` layers equals for additional dimensions.
<!-- can we plot them in tmap? -->
<!-- if so - there should be an example in the book + reference -->

More information on how the `stars` objects are organized and how to operate on them can be found in the **stars** package vignettes at <https://r-spatial.github.io/stars> and in the Spatial Data Science book at <https://r-spatial.org/book/> [@Pebesma2023-ya].