follows | id |
---|---|
elmVegaliteWalkthrough1 |
litvis |
@import "../css/tutorial.less"
- Introduction
- Single View Specifications
- Layered and Multi-view Composition
- Interaction
A Single View specification (3:03)
Let's start with a simple table of data representing time-stamped weather data for Seattle:
date | precipitation | temp_max | temp_min | wind | weather |
---|---|---|---|---|---|
2012/01/01 | 0.0 | 12.8 | 5.0 | 4.7 | drizzle |
2012/01/02 | 10.9 | 10.6 | 2.8 | 4.5 | rain |
2012/01/03 | 0.8 | 11.7 | 7.2 | 2.3 | rain |
... | ... | ... | ... | ... | ... |
We can store the specification for retrieving the data in its own function for later reuse:
path : String
path =
"https://cdn.jsdelivr.net/npm/[email protected]/data/"
seattleData : Data
seattleData =
dataFromUrl (path ++ "seattle-weather.csv") [ parse [ ( "Date", foDate "%Y/%m/%d" ) ] ]
A Strip plot (3:26)
We could encode one of the numeric data fields as a strip plot where the horizontal position of a tick mark is determined by the magnitude of the data item (maximum daily temperature in this case). With elm-vegalite, we do the following to create this visualization expression:
stripPlot : Spec
stripPlot =
toVegaLite
[ seattleData
, encoding (position X [ pName "temp_max", pQuant ] [])
, tick []
]
Notice how there is no explicit definition of the axis details, color choice or size. These can be customised, but the default values are designed to follow good practice in visualization design.
The function toVegaLite
takes a list of grammar specifications and creates a single JSON object that encodes the entire design.
Three grammar elements are represented by the three functions dataFromUrl
, mark
and encoding
.
The encoding
function takes as a single parameter, a list of specifications that are themselves generated by other functions. In this case we use the function position
to provide an encoding of the temp_max
field as the x-position in our plot. The precise way in which temperature is mapped to the x-position will depend on the type of data we are encoding. We can provide a hint by declaring the measurement type of the data field, here pQuant
indicating a numeric measurement type. The final parameter of position
is a list of any additional encodings in our specification. Here, with only one encoding, we provide an empty list.
As we build up more complex visualizations we will use many more encodings. To keep the coding clear, the idiomatic way to do this with elm-vegalite is to chain encoding functions using point-free style. The example above coded in this way would be
stripPlot : Spec
stripPlot =
let
enc =
encoding
<< position X [ pName "temp_max", pQuant ]
in
toVegaLite [ seattleData, enc [], tick [] ]
Simple Histogram (5:02)
While the strip plot shows the range of temperatures, it is hard to see how many days have which temperatures. To see that, we need to show the distribution more explicitly. We can do this by binning the temperatures and then aggregating the data in each bin into counts. If we encode those counts by the y-position and change our mark from tick to bar we have our frequency histogram:
histogram : Spec
histogram =
let
enc =
encoding
<< position X [ pName "temp_max", pQuant, pBin [] ]
<< position Y [ pAggregate opCount, pQuant ]
in
toVegaLite [ seattleData, enc [], bar [] ]
The code now contains two chained position
encodings: one for the x-position, which is now binned, and one for the y-position which is aggregated by providing pAggregate opCount
instead of a data field name.
Notice again that sensible defaults are provided for the parts of the specification we didn't specify such as axis titles, colours and number of bins.
Stacked Histogram (7:03)
Position isn't the only channel we can use to encode data. Color is an important channel in many visualizations, so we can use it here to encode the dominant weather type for each date in our table. The overall shape of the histogram is the same, but now can get some idea of the separate distributions for each of the recorded weather types.
stackedHistogram : Spec
stackedHistogram =
let
enc =
encoding
<< position X [ pName "temp_max", pBin [] ]
<< position Y [ pAggregate opCount, pQuant ]
<< color [ mName "weather" ]
in
toVegaLite [ seattleData, enc [], bar [] ]
The code to do this simply adds another channel encoding, this time color
rather than position
, and uses it to encode the weather
data field. Unlike temperature, weather type is nominal, that is, categorical with no intrinsic order. By default, fields are assumed to be nominal, so we don't need to specify the measurement type explicitly (although we could add mNominal
to the color
properties if we wished).
Notice how functions are used to customise various channels starting with a letter indicating the type of channel affected. So the name of the data field use to encode position is pName
, its measurement type, pQuant
and its positional aggregation is pAggregate
, whereas the name of the data field for encoding color is indicated by mName
(where m
is short for mark).
Stacked Histogram with Customised Colours (7:20)
While the default nominal colour scheme is well chosen for general purposes, we might want to customise the colors to better match the semantics of the data.
Changing the way a channel is encoded involves specifying the scale and in particular the mapping between the domain (the elements of the data to show) and the colour range used to represent them.
weatherColors : List ScaleProperty
weatherColors =
categoricalDomainMap
[ ( "sun", "#e7ba52" )
, ( "fog", "#c7c7c7" )
, ( "drizzle", "#aec7ea" )
, ( "rain", "#1f77b4" )
, ( "snow", "#9467bd" )
]
stackedHistogram : Spec
stackedHistogram =
let
enc =
encoding
<< position X [ pName "temp_max", pBin [] ]
<< position Y [ pAggregate opCount ]
<< color [ mName "weather", mScale weatherColors ]
in
toVegaLite [ seattleData, enc [], bar [] ]
The mapping between the values in the domain (weather types sun
, fog
etc.) and the colours used to represent them (hex values #e7ba52
, #c7c7c7
etc.) is handled by an elm-vegalite function categoricalDomainMap
which accepts a list of tuples defining those mappings.
Notice how we never needed to state explicitly that we wished our bars to be stacked. This was reasoned directly by Vega-Lite based on the combination of bar marks and colour channel encoding. If we were to change just the mark function from bar
to line
, Vega-Lite produces an unstacked series of lines, which makes sense because unlike bars, lines do not occlude one another to the same extent.
lineChart : Spec
lineChart =
let
enc =
encoding
<< position X [ pName "temp_max", pBin [] ]
<< position Y [ pAggregate opCount ]
<< color [ mName "weather", mScale weatherColors ]
in
toVegaLite [ seattleData, line [], enc [] ]
The stacked bar chart version is better at showing the overall distribution of all weather types, but it is more difficult to compare distributions of anything other than sun as all other weather types lack a common baseline. To compare distributions of all categories we can move from a single view to a multi-view composition.