Skip to content

Commit

Permalink
add dists-categorical vignette; closes #144
Browse files Browse the repository at this point in the history
  • Loading branch information
mpadge committed Oct 5, 2021
1 parent 59e7df3 commit 58a3a74
Show file tree
Hide file tree
Showing 4 changed files with 208 additions and 13 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: dodgr
Title: Distances on Directed Graphs
Version: 0.2.10.024
Version: 0.2.10.025
Authors@R: c(
person("Mark", "Padgham", , "[email protected]", role = c("aut", "cre")),
person("Andreas", "Petutschnig", role = "aut"),
Expand Down Expand Up @@ -32,6 +32,7 @@ Imports:
Rcpp (>= 0.12.6),
RcppParallel
Suggests:
bench,
dplyr,
geodist,
ggplot2,
Expand Down
26 changes: 16 additions & 10 deletions codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"codeRepository": "https://github.com/ATFutures/dodgr",
"issueTracker": "https://github.com/ATFutures/dodgr/issues",
"license": "https://spdx.org/licenses/GPL-3.0",
"version": "0.2.10.024",
"version": "0.2.10.25",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
Expand Down Expand Up @@ -70,6 +70,18 @@
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "bench",
"name": "bench",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=bench"
},
{
"@type": "SoftwareApplication",
"identifier": "dplyr",
Expand Down Expand Up @@ -371,10 +383,7 @@
],
"releaseNotes": "https://github.com/ATFutures/dodgr/blob/master/NEWS.md",
"readme": "https://github.com/ATFutures/dodgr/blob/main/README.md",
"contIntegration": [
"https://github.com/atfutures/dodgr/actions?query=workflow%3AR-CMD-check",
"https://codecov.io/gh/ATFutures/dodgr"
],
"contIntegration": ["https://github.com/atfutures/dodgr/actions?query=workflow%3AR-CMD-check", "https://codecov.io/gh/ATFutures/dodgr"],
"developmentStatus": "https://www.repostatus.org/#active",
"keywords": [
"distance",
Expand Down Expand Up @@ -403,15 +412,12 @@
"@type": "PublicationIssue",
"datePublished": "2019",
"isPartOf": {
"@type": [
"PublicationVolume",
"Periodical"
],
"@type": ["PublicationVolume", "Periodical"],
"name": "Transport Findings"
}
}
}
],
"relatedLink": "https://CRAN.R-project.org/package=dodgr",
"fileSize": "13844.948KB"
"fileSize": "13908.223KB"
}
189 changes: 189 additions & 0 deletions vignettes/dists-categorical.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: "Aggregating distances along categories of edges"
author: "Mark Padgham"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_float: true
number_sections: false
theme: flatly
vignette: >
%\VignetteIndexEntry{4 dodgr_dists_categorical}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r pkg-load, echo = FALSE, message = FALSE}
library (dodgr)
```

The [`dodgr_dists_categorical`
function](https://atfutures.github.io/dodgr/reference/dodgr_dists_categorical.html)
enables multiple distances to be aggregated along distinct categories of edges
with a single query. This is particularly useful to examine information on
proportions of total distances routed along different edge categories. The
following three sub-sections describe the three main uses and interfaces of the
[`dodgr_dists_categorical`
function](https://atfutures.github.io/dodgr/reference/dodgr_dists_categorical.html).
Each of these requires an input `graph` to have an additional column named
`"edge_type"`, which labels discrete categories of edges. These can be any kind
of discrete labels at all, from integer values to character labels or factors.
The labels are retained in the result, as demonstrated below.

# 1 Full Distance Information for Edge Categories

The "default" interface of the
[`dodgr_dists_categorical`
function](https://atfutures.github.io/dodgr/reference/dodgr_dists_categorical.html)
requires the same three mandatory parameters as
[`dodgr_distances`](https://atfutures.github.io/dodgr/reference/dodgr_dists.html), of

1. A weighted `graph` on which the distances are to be calculated;
2. A vector of `from` points from which distances are to be calculated; and
3. A corresponding vector of `to` points.

As for
[`dodgr_distances`](https://atfutures.github.io/dodgr/reference/dodgr_dists.html),
the `from` and `to` arguments can be either vertex identifiers (generally as
`from_id` and `to_id` columns of the input `graph`), or two-column coordinates
for spatial graphs. The following code illustrates the procedure, using the
internal data set,
[`hampi`](https://atfutures.github.io/dodgr/reference/hampi.html), initially
starting by reducing to the largest connected component only, to ensure all
points are mutually reachable.

```{r hampi-edge-types}
graph <- weight_streetnet (hampi, wt_profile = "foot")
graph <- graph [graph$component == 1, ]
graph$edge_type <- graph$highway
table (graph$edge_type)
```

That network then has `r length (unique (graph$edge_type))` distinct edge
types. Submitting this graph to the function, and calculating pairwise
distances between all points, then gives the following result:

```{r full-dists1}
v <- dodgr_vertices (graph)
from <- to <- v$id
d <- dodgr_dists_categorical (graph, from, to)
class (d)
length (d)
sapply (d, dim)
```

The result has the dedicated class, `dodgr_dists_categorical`, which it itself
a list of matrices, one for each distinct edge type. This class enables
a convenient `summary` method which converts data on aggregate distances along
each category of edges into overall proportions:

```{r summary-full-dists, eval = FALSE}
summary (d)
```
```{r summary-out, echo = FALSE, collapse = TRUE}
s <- summary (d)
```


# 2. Proportional Distances along each Edge Category

If `summary` results like those immediately above are all that is desired,
without any additional information from the full distance matrices, then
a `proportions_only` parameter can be used to directly return those:

```{r prop-only}
dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE)
```

Queries with `proportions_only = TRUE` are constructed in a different way in
the underlying C++ code, with the main difference being that the full list of
matrices is not stored, and these queries will generally use considerably less
memory. For most jobs, this should translate to faster queries, as illustrated
in the following benchmark:

```{r prop-only-benchmark, warning = FALSE}
bench::mark (full = dodgr_dists_categorical (graph, from, to),
prop_only = dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE),
check = FALSE, time_unit = "s") [, 1:3]
```

# 3. Proportional Distances within a Threshold Distance

The third and final use of the [`dodgr_dists_categorical`
function](https://atfutures.github.io/dodgr/reference/dodgr_dists_categorical.html)
is through the `dlimit` parameter, used to specify a distance threshold below
which categorical distances are to be aggregated. This is useful to examine
relative proportions of different edges types necessary in travelling in any
and all directions away from each point or vertex of a graph.

When a `dlimit` parameter is specified, the `to` parameter is ignored, and
distances are aggregated along all possible routes away from each `from` point,
out to the specified `dlimit`. The value of `dlimit` must be specified relative
to the edge distance values contained in the input graph. For spatial graphs
obtained with [`dodgr_streetnet()` or
`dodgr_streetnet_sc()`](https://atfutures.github.io/dodgr/reference/dodgr_streetnet.html),
for example, as well as the internal [`hampi`
data](https://atfutures.github.io/dodgr/reference/hampi.html), these distances
are in metres, and so `dlimit` must be specified in metres.

The result in then a single matrix in which each row represents one of the
`from` points, and there is one column of aggregate distances for each edge
type, plus an initial column of overall distances. The following code
illustrates:

```{r dists-dlimit}
dlimit <- 2000 # in metres
d <- dodgr_dists_categorical (graph, from, dlimit = dlimit)
dim (d)
head (d)
```

The row names of the resultant `data.frame` are the vertex identifiers
specified in the `from` parameter. Such results can easily be combined with
spatial information on the vertices obtained from the [`dodgr_vertices()~
function](https://atfutures.github.io/dodgr/reference/dodgr_vertices.html) to
generate spatial maps of relative proportions around each point in a graph or
network. Summary statistics can also readily be extracted, for example,

```{r hist-path}
hist (d$path / d$distance,
xlab = "Relative proportions of trips along paths", main = "")
```

Trips along paths are roughly evenly distributed between 0 and 1. In contrast,
proportions of trips along service ways -- used to facilitate motorised
vehicular access in the otherwise car-free area of Hampi, India -- are
distinctly different:

```{r hist-track}
hist (d$service / d$distance,
xlab = "Relative proportions of trips along service ways", main = "")
```

These distributions provide more detailed and nuanced insights than those
provided by the overall `summary` functions above, which only revealed overall
respective relative proportions of `r round (s [["path"]], digits = 2)` and
`r round (s [["service"]], digits = 2)` for paths and service ways. The results
within the distance threshold reveal that the distributional forms of
proportional distances differ as much as the aggregate values, and that both
aspects of the function provide distinct insights into proportional distances
along categories of edge types.

Finally, this use of the function also utilizes distinct difference in the
underlying C++ code that are even more efficient that the previous case of
proportional distances. The following code benchmarks the three modes:

```{r benchmark3, warning = FALSE}
bench::mark (full = dodgr_dists_categorical (graph, from, to),
prop_only = dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE),
dlimit = dodgr_dists_categorical (graph, from, dlimit = 2000),
check = FALSE, time_unit = "s") [, 1:3]
```

Finally, note that the efficiency of distance-threshold queries scales
non-linearly with increases in `dlimit`, with queries quickly becoming less
efficient for larger values of `dlimit`.
3 changes: 1 addition & 2 deletions vignettes/makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
LFILE = times
LFILE = dists-categorical

all: knith
#all: knith open
Expand Down Expand Up @@ -60,4 +60,3 @@ convone_t: times-fig1.pdf

rmone_t:
rm times-fig1.aux times-fig1.log times-fig1.pdf

0 comments on commit 58a3a74

Please sign in to comment.