id | elm | ||||
---|---|---|---|---|---|
litvis |
|
@import "../css/tutorial.less"
import VegaLite exposing (..)
This is one of a series of 'geo' tutorials for use with litvis.
- Geospatial File Formats
- Generating Global Map Projection Geo Files
- Importing geographic datasets into elm-vegalite
- Importing and displaying OpenStreetMap data
This tutorial leads you through the workflow for importing spatial datasets into elm-vegalite, and therefore into litvis documents. If the dataset you wish to import is a shapefile, the process involves the following steps (and if it is already in geoJSON format, just jump to step 3).
- Change the shapefile's projection to use longitude/latitude coordinates with the WGS84 ellipsoid.
- Convert the shapefile into a geoJSON file.
- Identify the attribute field you wish to store and associate it with an id.
- Convert from geoJSON to topoJSON.
- Simplify geometry to reduce file size
- render file with elm-vegalite
For related tutorials, see Mike Bostock's series Command-Line Cartography and Ændrew Rininsland's Creating topoJSON using D3 v4 and for a thorough more London-centric example, Max Harlow's Working with geographic data tutorial.
To perform some of the file conversions you will need the Geospatial Abstraction Data Library (gdal), so if you do not have this already, as a one-off setup stage you should install it. Here are some installation instructions for MacOS, for Windows, and for Ubuntu).
After installation, opening a command line window and typing
ogr2ogr --version
should display a message similar to
GDAL 2.4.1, released 2019/03/15
You will also need to install some d3 command-line utilities, so open a command line window and type
npm install -g shapefile ndjson-cli topojson d3-geo-centroid
The -g
option ensures the installed packages are available globally from any directory. The gdal
package will be used for the map projection work; shapefile
for the conversion from shapefile to geoJSON; ndjson-cli
splits JSON files into separate lines to ease processing; topojson
does the conversion to topoJSON and the topology-preserving simplification; and d3-geo-centroid
is used for generating centroid locations from polygons.
To keep things flexible, we'll also define a path
function in Elm pointing to the base directory where all data for this tutorial are stored. You can leave the default as shown below to load the data from the giCentre data repository or replace it with a local folder if you have your own copy of the data.
path : String -> String
path fileName =
"https://gicentre.github.io/data/geoTutorials/" ++ fileName
If your data already use longitude/latitude with WGS84, you can skip this step.
Download the shapefiles, noting that you will need the geometry (something.shp), attribute (something.dbf) and index (something.shx) and projection (something.prj) files. For this tutorial we will use London_Borough_Excluding_MHW
from the London Data Store.
As part of the conversion we will convert the geometry from the projected coordinates it currently uses into 'unprojected' longitude/latitude spherical coordinates. This requires us to know what the projection the source data are using. Typically a shapefile collection will come with a .proj
file that will contain the projection information. We need to provide these projection parameters to gdal
, but thankfully most standard projections have a EPSG code referencing this information which we can pass to the converter. In our case the data use the Ordnance Survey National Grid with an EPSG of 27700. The EPSG code for unprojected longitude/latitude using the WGS84 ellipsoid is 4326.
To convert the shapefile, open a terminal and cd
to the folder containing the downloaded shapefiles, then type
ogr2ogr -t_srs 'EPSG:4326' -s_srs 'EPSG:27700' boroughs_geo.shp London_Borough_Excluding_MHW.shp
substituting the name of the input and output .shp files with your own.
Type the following:
shp2json boroughs.shp > boroughs_geo.json
While you are getting used to this workflow it can be helpful to append a _geo
to the output geoJSON file you create here so as not to get confused with the topoJSON file created later (both share the same .json
file extension).
When we render the data we will need an ID to refer to each spatial feature within the dataset (boroughs in our case). GeoJSON and topoJSON files can have an optional id
for each feature which we need to add as part of our workflow. The original shape file should have had one or more attributes associated with each feature and these get transferred to the geoJSON as 'properties'. You can see these properties by viewing the .json file in a text editor. Choose the property you wish to make the id and then type the following:
ndjson-split 'd.features' < boroughs_geo.json > boroughs_geo.ndjson
ndjson-map 'd.id = d.properties.NAME, delete d.properties, d' < boroughs_geo.ndjson > boroughs_id_geo.ndjson
ndjson-reduce < boroughs_id_geo.ndjson | ndjson-map '{type: "FeatureCollection", features: d}' > boroughs_id_geo.json
replacing NAME
with the name of the attribute property you wish to become the ID (in this London boroughs example, the property is actually called NAME
, representing the name of each borough).
The first line splits the geoJSON file so that each feature (borough boundary in this example) is on its own line in the file. The second line selects a property and adds it as an id
to each line while deleting the other properties as they are no longer needed. The third line converts the split feature-per-line file back into a geoJSON file with the new id
as well as the original properties.
Note that if you wish to do more complex data manipulation such as combining attributes from multiple files or calculating new feature attributes, it is on these feature-per-line ndjson
files that it is performed. See Mike Bostock's command line cartography for examples.
GeoJson files store each spatial object as a string of coordinates tracing its boundary (or boundaries for complex features). This is rather inefficient for adjacent areas that share a common boundary and it makes it difficult to infer topological relationships such as adjacency. By converting to the topoJSON format we can use a much more efficient data structure that stores common boundary lines only once:
geo2topo boroughs=boroughs_id_geo.json > boroughs_topo.json
In this conversion we give a name to the new feature object (here called boroughs
) that we will reference later in elm-vegalite.
Again, as you are learning to use this workflow, it can be helpful to name the resulting file with _topo
appended so we know the .json
file is now in topoJSON, not geoJSON, format.
If you compare the sizes you will see that the topoJSON file is smaller than its geoJSON equivalent thanks to its more efficient data structure. We can however shrink the file size much more by (i) simplifying complex lines; (ii) reducing the precision with which we store coordinates. We can do this because our aim is to produce a file for rendering, not accurate geoprocessing.
To simplify the line we need to specify a threshold that determines the degree of simplification. You can think of this threshold as being the length of line details below which we attempt to make all boundary lines straight. The larger the threshold, the simpler the boundary lines look. Because we are dealing with latitude and longitude coordinates, that threshold is expressed as steradians. Typically, useful thresholds will be in the order of 1e-9 steradians but it pays to experiment to find a suitable level of simplification.
To reduce the precision of coordinates we use topoquantize
with a parameter typically of order 1e4 indicating 4 significant digits of precision.
Simplification and quantization can be combined by typing:
toposimplify -s 1e-9 -f < boroughs_topo.json | topoquantize 1e4 > londonBoroughs.json
Notice how this step radically reduces the size of the topoJSON file for complex shapefiles, in this case from 2Mb down to 15k.
Steps 2 to 5 above have been separated in order to explain the various stages of file conversion. But the piping model of the d3 programs means that they can be combined into a more compact set of operations that avoid generating the intermediate files. The following therefore can be used as a compact alternative to all 4 steps:
shp2json -n boroughs.shp \
| ndjson-map 'd.id = d.properties.NAME, delete d.properties, d' \
| geo2topo -q 1e4 -n boroughs="-" \
| toposimplify -s 1e-9 -f \
> londonBoroughs.json
Using the final output file after conversion (here, londonBoroughs.json
), we can render all features with some simple elm-vegalite:
boroughs : Spec
boroughs =
let
cfg =
configure
<< configuration (coView [ vicoStroke Nothing ])
data =
dataFromUrl (path "londonBoroughs.json") [ topojsonFeature "boroughs" ]
enc =
encoding << color [ mName "id" ]
in
toVegaLite [ width 600, height 400, cfg [], data, enc [], geoshape [] ]
We can use the d3 command line tools to generate some additional files that may be useful for rendering. For example, suppose we wished to create a custom colour scheme to associate with each of the 33 boroughs in the dataset. In elm-vegalite, custom categorical colours are generated with
categoricalDomainMap
[ ( categoryName1, categoryColour1 )
, ( categoryName2, categoryColour2 )
, ...
]
We might generate a list of 33 categorical colours with some external service such IWantHue. To generate the equivalent list of category names we need to extract those IDs from the imported dataset, which we can do with
shp2json -n boroughs.shp | ndjson-map 'd.properties.NAME'
This allows us to create the elm-vegalite to generate a custom colour palette for the London map:
boroughColors : List ScaleProperty
boroughColors =
categoricalDomainMap
[ ( "Kingston upon Thames", "#9db7b1" )
, ( "Croydon", "#c6e1db" )
, ( "Bromley", "#afb9cb" )
, ( "Hounslow", "#b2add6" )
, ( "Ealing", "#e2f8ca" )
, ( "Havering", "#a1bde6" )
, ( "Hillingdon", "#e8aa95" )
, ( "Harrow", "#8bd0eb" )
, ( "Brent", "#dfb89b" )
, ( "Barnet", "#cec9f3" )
, ( "Lambeth", "#e3aba7" )
, ( "Southwark", "#86cbd1" )
, ( "Lewisham", "#ecb1c2" )
, ( "Greenwich", "#acd8ba" )
, ( "Bexley", "#e4bad9" )
, ( "Enfield", "#9bd6ca" )
, ( "Waltham Forest", "#cbd5e7" )
, ( "Redbridge", "#c9d2a8" )
, ( "Sutton", "#d1c1d9" )
, ( "Richmond upon Thames", "#ddcba2" )
, ( "Merton", "#a2acbd" )
, ( "Wandsworth", "#deefd6" )
, ( "Hammersmith and Fulham", "#f6d4c9" )
, ( "Kensington and Chelsea", "#b5d7a7" )
, ( "Westminster", "#add4e0" )
, ( "Camden", "#c2d2ba" )
, ( "Tower Hamlets", "#d4b4e5" )
, ( "Islington", "#e0c7ce" )
, ( "Hackney", "#a6b79f" )
, ( "Haringey", "#a2e7ed" )
, ( "Newham", "#d9b9ad" )
, ( "Barking and Dagenham", "#ebe2cf" )
, ( "City of London", "#c7bfad" )
]
Setting the mScale
to use this list gives us our new colour scheme:
boroughs : Spec
boroughs =
let
cfg =
configure
<< configuration (coView [ vicoStroke Nothing ])
data =
dataFromUrl (path "londonBoroughs.json") [ topojsonFeature "boroughs" ]
enc =
encoding
<< color [ mName "id", mScale boroughColors ]
in
toVegaLite
[ width 600
, height 400
, cfg []
, data
, enc []
, geoshape [ maStroke "white", maStrokeWidth 2 ]
]
We can use this approach to extract any of the properties (attributes) from the original shapefile. For example, the following will extract the name and GSS code of each spatial feature:
shp2json -n boroughs.shp | ndjson-map '[d.properties.NAME,d.properties.GSS_CODE]'
giving
["Kingston upon Thames","E09000021"]
["Croydon","E09000008"]
["Bromley","E09000006"]
["Hounslow","E09000018"]
:
:
But suppose we wished to label each borough directly on the map rather than via a colour legend. We would need to generate a new dataset that contained the label position for each borough. We can do that by calculating the centroid of each borough polygon using the d3's geoCentroid()
:
shp2json -n boroughs.shp \
| ndjson-map -r d3 'c=d3.geoCentroid(d.geometry),{"name":d.properties.NAME,"cx":c[0],"cy":c[1]}' \
| ndjson-reduce \
> londonCentroids.json
This generates the following json file which we can load into our visualization to display labels
[{"name":"Kingston upon Thames","cx":-0.28691465719243003,"cy":51.387893353778395},
{"name":"Croydon","cx":-0.08716546101532906,"cy":51.355317416374},
{"name":"Bromley","cx":0.051536586180461884,"cy":51.37198402033694},
{"name":"Hounslow","cx":-0.3671516396902454,"cy":51.46838045919461},
:
]
We create two layers, one for the coloured polygons, the other for the name labels placed at centroid locations. Note also the use of the transform
to display only the first word in a multi-word borough name.
boroughsCustom : Float -> Float -> Spec
boroughsCustom w h =
let
cfg =
configure
<< configuration (coView [ vicoStroke Nothing ])
data =
dataFromUrl (path "londonBoroughs.json") [ topojsonFeature "boroughs" ]
polyEnc =
encoding
<< color [ mName "id", mScale boroughColors, mLegend [] ]
polySpec =
asSpec
[ data
, polyEnc []
, geoshape [ maStroke "white", maStrokeWidth (2 * w / 700) ]
]
labelEnc =
encoding
<< position Longitude [ pName "cx" ]
<< position Latitude [ pName "cy" ]
<< text [ tName "bLabel" ]
trans =
transform
<< calculateAs "indexof (datum.name,' ') > 0 ? substring(datum.name,0,indexof(datum.name, ' ')) : datum.name" "bLabel"
labelSpec =
asSpec
[ dataFromUrl (path "londonCentroids.json") []
, trans []
, labelEnc []
, textMark [ maSize (8 * w / 700), maOpacity 0.6 ]
]
in
toVegaLite
[ width w
, height h
, cfg []
, layer [ polySpec, labelSpec ]
]
boroughs : Spec
boroughs =
boroughsCustom 700 500
topoJSON files are not limited to areal units. Here, for example, we can import a file containing the geographical routes of selected London Underground tube lines. The conversion of the tfl_lines.json follows a similar pattern to the conversion of the borough boundary files, but with some minor differences:
- The file is already in unprojected geoJSON format so does not need reprojecting or conversion from a shapefile.
ndjson-cat
converts the original geoJSON file to a single line necessary for further processing.- the file contains details of more rail lines than we need to map so
ndjson.filter
is used with a regular expression to select data for tube and DLR lines only. - the property we will use for the id (the tube line name) is inside the first element of an array so we reference it with
[0]
(where there is more than one element in the array it indicates more than one named tube line shares the same physical line).
ndjson-cat < tfl_lines.json \
| ndjson-split 'd.features' \
| ndjson-filter 'd.properties.lines[0].name.match("Ci.*|Di.*|No.*|Ce.*|DLR|Ha.*|Ba.*|Ju.*|Me.*|Pi.*|Vi.*|Wa.*")' \
| ndjson-map 'd.id = d.properties.lines[0].name,delete d.properties,d' \
| geo2topo -n -q 1e4 line="-" \
> londonTubeLines.json
We can display the newly created topoJSON file of the tube lines much as we did the original London map:
tubeLines : Spec
tubeLines =
let
data =
dataFromUrl (path "londonTubeLines.json") [ topojsonFeature "line" ]
cfg =
configure
<< configuration (coView [ vicoStroke Nothing ])
enc =
encoding
<< color [ mName "id" ]
in
toVegaLite [ width 700, height 400, cfg [], data, enc [], geoshape [ maFilled False ] ]
We can improve the design by thickening the lines, giving them their standard tube line colours and moving the legend to the bottom-right corner:
tubeLineColors : List ScaleProperty
tubeLineColors =
categoricalDomainMap
[ ( "Bakerloo", "rgb(137,78,36)" )
, ( "Central", "rgb(220,36,30)" )
, ( "Circle", "rgb(255,206,0)" )
, ( "District", "rgb(1,114,41)" )
, ( "DLR", "rgb(0,175,173)" )
, ( "Hammersmith & City", "rgb(215,153,175)" )
, ( "Jubilee", "rgb(106,114,120)" )
, ( "Metropolitan", "rgb(114,17,84)" )
, ( "Northern", "rgb(0,0,0)" )
, ( "Piccadilly", "rgb(0,24,168)" )
, ( "Victoria", "rgb(0,160,226)" )
, ( "Waterloo & City", "rgb(106,187,170)" )
]
tubeLines : Spec
tubeLines =
let
cfg =
configure
<< configuration (coView [ vicoStroke Nothing ])
data =
dataFromUrl (path "londonTubeLines.json") [ topojsonFeature "line" ]
enc =
encoding
<< color
[ mName "id"
, mLegend [ leTitle "", leOrient loBottomRight ]
, mScale tubeLineColors
]
in
toVegaLite [ width 700, height 500, cfg [], data, enc [], geoshape [ maFilled False, maStrokeWidth 2 ] ]
Finally, we can overlay the routes onto the borough boundaries to provide more of a geographical context:
tubeLines : Spec
tubeLines =
let
centroidData =
dataFromUrl (path "londonCentroids.json") []
boroughData =
dataFromUrl (path "londonBoroughs.json") [ topojsonFeature "boroughs" ]
lineData =
dataFromUrl (path "londonTubeLines.json") [ topojsonFeature "line" ]
polySpec =
asSpec
[ boroughData
, geoshape [ maFill "#eee", maStroke "white", maStrokeWidth 2 ]
]
labelEnc =
encoding
<< position Longitude [ pName "cx" ]
<< position Latitude [ pName "cy" ]
<< text [ tName "bLabel" ]
trans =
transform
<< calculateAs "indexof (datum.name,' ') > 0 ? substring(datum.name,0,indexof(datum.name, ' ')) : datum.name" "bLabel"
labelSpec =
asSpec
[ centroidData
, trans []
, labelEnc []
, textMark [ maSize 8, maOpacity 0.6 ]
]
tubeEnc =
encoding
<< color
[ mName "id"
, mLegend [ leTitle "", leOrient loBottomRight, leOffset 0 ]
, mScale tubeLineColors
]
routeSpec =
asSpec
[ lineData
, tubeEnc []
, geoshape [ maFilled False, maStrokeWidth 2 ]
]
in
toVegaLite
[ width 700
, height 500
, configure (configuration (coView [ vicoStroke Nothing ]) [])
, layer [ polySpec, labelSpec, routeSpec ]
]