tmap/charts.qmd at main · geocompx/tmap · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
```{r charts-1}
#| echo: false
source("code/before_script.R")
```

# Charts {#sec-charts}

\index{charts}
Thematic maps usually represent one or more variables using colors, shapes, or sizes.
Then, the map legend is used to explain the meaning of these visual variables (@sec-legends).
Such a legend can be expanded (or even replaced, @sec-charts-customization) with a chart that provides more information about the distribution of the variable values.

\index{histogram}
The **tmap** package provides several chart types that can be used in the map legend (@tbl-layers-table).
They are specified in one of the `*.chart` arguments of a layer function, e.g., `fill.chart` of `tm_polygons()`, `size.chart` of `tm_dots()`, etc.
The use of a chart type depends on the type of data we want to represent, i.e., whether it is numerical or categorical, and whether it is univariate or bivariate.

```{r charts-2}
#| echo: false
charts_df = tibble::tribble(
  ~Function, ~Description,
  "tm_chart_histogram()", "Histogram",
  "tm_chart_box()", "Box plot",
  "tm_chart_violin()", "Violin plot",
  "tm_chart_bar()", "Bar chart",
  "tm_chart_donut()", "Donut chart",
  "tm_chart_heatmap()", "Heatmap"#,
  # "tm_chart_none()", "No chart"
)
```

<!-- Martijn, why do we need to show and explain `tm_chart_none()`? -->

```{r}
#| label: tbl-layers-table
#| tbl-cap: "Available chart types."
#| echo: false
#| warning: false
#| message: false
charts_df |>
  kableExtra::kbl(booktabs = TRUE) |>
  kableExtra::kable_styling("striped",
                            latex_options = "striped",
                            full_width = FALSE) |>
  kableExtra::column_spec(1, monospace = TRUE)
```

For the examples in this chapter, we use the Slovenian regions dataset that contains information about the population density, region group, and the percentage of the population aged 65 or more in each region.

```{r}
#| message: false
library(tmap)
library(sf)
slo_regions = read_sf("data/slovenia/slo_regions.gpkg")
```

## Numerical data

The first group of charts is used for numerical data, i.e., data that can take any value within a range.
It includes histograms, box plots, violin plots, and donut charts.

\index{histogram}
Histograms are the most common chart type used to represent the distribution of a numerical variable.
They show how often each range of values occurs in the data -- helping to understand which values are more common and if there are any outliers.
To add a histogram to the map, we use the `tm_chart_histogram()` function in the `fill.chart` argument of a map layer function, such as `tm_polygons()` (@fig-chart-hist-1).

```{r}
#| label: hist1
#| eval: false
tm_shape(slo_regions) +
  tm_symbols(fill = "pop_dens",
             fill.chart = tm_chart_histogram())
```

Such a histogram, like other charts, is automatically placed along the map legend.
However, we can position them independently of the map legend using the `position` arguments of the `tm_chart_*()` and `tm_legend()` functions (@fig-chart-hist-2).

```{r}
#| label: hist2
#| eval: false
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_histogram(
                position = tm_pos_out("left", "center")
                ),
              fill.legend = tm_legend(
                position = tm_pos_out("right", "center")
                )
              )
```

```{r}
#| label: fig-chart-hist
#| fig-cap: "Histogram added to a map to represent the distribution of a variable."
#| fig-subcap:
#|   - "default histogram"
#|   - "repositioned histogram"
#| message: false
#| echo: false
#| layout-nrow: 2
#| fig-asp: 0.5
<<hist1>>
<<hist2>>
```

@fig-chart-num shows three additional chart types that can be used for numerical data: box plots, violin plots, and donut charts.
They can be added to the map using the `tm_chart_box()`, `tm_chart_violin()`, and `tm_chart_donut()` functions, respectively.
As you may notice, these charts show similar messages, e.g., that the middle value range is the most common.
Moreover, box plots directly show the median and quartiles of the data, violin plots also show the density of the data distribution, and donut charts show the proportions of the data.
They also have different aesthetics.
Thus, the choice of the chart type depends on the message we want to convey and the aesthetics we prefer.

```{r}
#| eval: false
# box plot
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_box())
# violin plot
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_violin())
# donut plot
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_donut())
```

```{r}
#| label: fig-chart-num
#| fig-cap: "Charts for numerical data."
#| message: false
#| echo: false
#| fig-asp: 0.8
#| fig-width: 9
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_box(group_id = "a"),
              fill.legend = tm_legend(group_id = "a")) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_violin(group_id = "b"),
              fill.legend = tm_legend(group_id = "b")) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_donut(group_id = "c"),
              fill.legend = tm_legend(group_id = "c")) +
  tm_components("a", position = tm_pos_out("center", "bottom", "left", "top"), stack = "horizontal") +
  tm_components("b", position = tm_pos_out("center", "bottom", "center", "top"), stack = "horizontal") +
  tm_components("c", position = tm_pos_out("center", "bottom", "right", "top"), stack = "horizontal")
```

## Categorical data

\index{bar charts}
\index{donut charts}
The second group of charts is used for categorical data -- which can take only a limited number of values.
It includes donut charts (`tm_chart_donut()`) and bar charts (`tm_chart_bar()`): bar charts show the frequency of each category, while donut charts show the proportions of each category (@fig-chart-cat).
In the examples below, we represent the region groups of each region in Slovenia and observe that there are more regions in the "Central" and "West" groups than in the others.

```{r}
#| eval: false
tm_shape(slo_regions) +
  tm_polygons(fill = "region_group",
              fill.chart = tm_chart_donut())
tm_shape(slo_regions) +
  tm_polygons(fill = "region_group",
              fill.chart = tm_chart_bar())
```

```{r}
#| label: fig-chart-cat
#| fig-cap: "Charts for categorical data."
#| message: false
#| echo: false
#| fig-asp: 0.8
#| fig-width: 9
tm_shape(slo_regions) +
  tm_polygons(fill = "region_group",
              fill.chart = tm_chart_donut(group_id = "a"),
              fill.legend = tm_legend(group_id = "a")) +
  tm_polygons(fill = "region_group",
              fill.chart = tm_chart_bar(group_id = "b"),
              fill.legend = tm_legend(group_id = "b")) +
  tm_components("a", position = tm_pos_out("center", "bottom", "left", "top"), stack = "horizontal") +
  tm_components("b", position = tm_pos_out("center", "bottom", "right", "top"), stack = "horizontal")
```

## Bivariate data

\index{bivariate data}
\index{heatmap}
The third group of charts is used for bivariate data, i.e., data that contains two variables.
It includes heatmap (`tm_chart_heatmap()`) that shows how often each combination of values from two variables occurs in the data (@fig-chart-bivariate).

```{r}
#| label: fig-chart-bivariate
#| fig-cap: "Bivariate charts used in case to represent relationships between two numerical variables."
#| message: false
#| fig-width: 9
tm_shape(slo_regions) +
  tm_polygons(fill = tm_vars(c("pop_dens", "pop65perc"), multivariate = TRUE),
              fill.scale = tm_scale_bivariate(values = "purplegold"),
              fill.chart = tm_chart_heatmap())
```

```{r}
#| echo: false
#| eval: false
library(stars)
slo_tavg = read_stars("data/slovenia/slo_tavg.tif")
tm_shape(slo_tavg) +
  tm_raster(col = tm_vars(dimvalues = c("tavg_1", "tavg_7"), multivariate = TRUE),
            col.scale = tm_scale_bivariate(values = "purplegold"),
            col.chart = tm_chart_heatmap())
```

## Additional chart customization {#sec-charts-customization}

\index{ggplot2}
All of the charts are directly based on the visual variables from the map layer and then created internally using the **ggplot2** package (@R-ggplot2).
By default, however, the charts are much simplified, not showing all the details that **ggplot2** can provide, such as axis labels, titles, and various theme elements.
<!-- https://github.com/geocompx/tmap/issues/10#issuecomment-3007409988 -->

At the same time, we still may want to customize the charts to fit our map design better or to add more information.
This can be done with additional arguments of the `tm_chart_*()` functions, such as `plot.axis.x`, `plot.axis.y`, and `extra.ggplot2`.
The first two are logical arguments that control whether the x and y axes are shown in the chart, respectively.
The last one, `extra.ggplot2`, is a list of additional **ggplot2** functions that are applied to the chart -- for example, we may want to change the aesthetics of the chart, such as the background color, as shown in @fig-chart-ggplot1.

```{r}
#| label: fig-chart-ggplot1
#| fig-cap: "Customizing charts with ggplot2."
#| message: false
library(ggplot2)
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_histogram(
                plot.axis.x = FALSE,
                extra.ggplot2 = list(
                  theme(plot.background = element_rect(fill = "lightgray"))
                )
              ))
```

@fig-chart-ggplot2 shows a more advanced modification of the chart.
It adds x and y axes, next, it adds a title to the chart, flips the chart coordinates, and changes the chart background color and title size.
Next, it moves the chart to the top left corner of the map and sets its width.
Having such a customized chart provides all of the essential map context information: variable name, unit, ranges of the values, and their related colors.
Thus, we can treat it as a map legend and remove the default legend using `fill.legend = tm_legend(show = FALSE)`.

```{r}
#| label: fig-chart-ggplot2
#| fig-cap: "Using chart as a map legend."
#| message: false
#| fig-width: 9
tm_shape(slo_regions) +
  tm_polygons(fill = "pop_dens",
              fill.chart = tm_chart_histogram(
                plot.axis.x = TRUE,
                plot.axis.y = TRUE,
                extra.ggplot2 = list(
                  labs(title = "Population density (people/sq. km)"),
                  coord_flip(),
                  theme(plot.background = element_rect(fill = "#FFC067"),
                        plot.title = ggplot2::element_text(size = 10))),
                position = tm_pos_in("LEFT", "TOP"),
                width = 24,
              ),
              fill.legend = tm_legend(show = FALSE))
```