forked from o-date/draft
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02.3-linked-open-data-and-data-publishing.rmd
24 lines (12 loc) · 2.29 KB
/
02.3-linked-open-data-and-data-publishing.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Linked Open Data and Data Publishing
<div class = rmdcaution> _This section is under development_. </div>
+ huggett, costopoulos discussion on desireability or not of different kinds of silos
+ linked open data one way of having cake and eating too
+ difficulties in effecting this vision
While many databases, services, or museums might expose their data via a web API, there can be limitations. Matthew Lincoln has an excellent tutorial at [The Programming Historian](https://programminghistorian.org/en/lessons/graph-databases-and-SPARQL) that walks us through some of these differences, but the key one is in the way the data is represented. When data is described using a 'Resource Description Framework', RDF, the resource - the 'thing'- is described via a series of relationships, rather than as rows in a table or keys having values.
Information is in the relationships. It's a network. It's a _graph_. Thus, every 'thing' in this graph can have its own _uniform resource identifier_ (URI) that lives as a location on the internet. Information can then be created by making _statements_ that use these URIs, similarly to how English grammar creates meaning: subject verb object. Or, in RDF-speak, 'subject predicate object', also known as a _triple_. In this way, data in _different_ places can be linked together by referencing the elements they have in common. This is Linked Open Data (LOD). The access point for interrogating LOD is called an 'endpoint'.
Finally, _SPARQL_ is an acronymn for SPARQL Protocol and RDF Query Language (yes, it's one of _those_ kinds of acronyms).
In the notebook for this section, we're not using Python or R directly. Instead, we've set up a 'kernel' (think of that as the 'engine' for the notebook) that already includes everything necessary to set up and run SPARQL queries. (For reference, the kernel code is [here](https://github.com/paulovn/sparql-kernel)). Both R and Python can interact with and query endpoints, and manipulate linked open data, but for the sake of learning a bit of what one can do with SPARQL, this notebook keeps all of that ancillary code tucked away. The followup notebook shows you how to use R to do some basic manipulations of the query results.
Note the sparql endpoint for nomisma also lets you download in csv, geojson, kml
### discussion
### exercises