Skip to content

Commit 4e1238e

Browse files
authored
Merge pull request IQSS#11045 from IQSS/10542-signposting
expose links to all export formats via Signposting
2 parents c0b09bb + bd78501 commit 4e1238e

File tree

9 files changed

+136
-25
lines changed

9 files changed

+136
-25
lines changed

Diff for: doc/release-notes/10542-signposting.md

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Signposting Output Now Contains Links to All Dataset Metadata Export Formats
2+
3+
When Signposting was added in Dataverse 5.14 (#8981), it only provided links for the `schema.org` metadata export format.
4+
5+
The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats (including any external exporters, such as Croissant, that have been enabled).
6+
7+
This provides a lightweight machine-readable way to first retrieve a list of links (via a HTTP HEAD request, for example) to each available metadata export format and then follow up with a request for the export format of interest.
8+
9+
In addition, the content type for the `schema.org` dataset metadata export format has been corrected. It was `application/json` and now it is `application/ld+json`.
10+
11+
See also [the docs](https://preview.guides.gdcc.io/en/develop/api/native-api.html#retrieve-signposting-information) and #10542.

Diff for: doc/sphinx-guides/source/admin/discoverability.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The Dataverse team has been working with Google on both formats. Google has `ind
5151
Signposting
5252
+++++++++++
5353

54-
The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.
54+
The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header. Links to all enabled metadata export formats are given. See :ref:`metadata-export-formats` for a list.
5555

5656
There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
5757
* Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"

Diff for: doc/sphinx-guides/source/api/changelog.rst

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ v6.6
1212

1313
- **/api/metadatablocks** is no longer returning duplicated metadata properties and does not omit metadata properties when called.
1414
- **/api/roles**: :ref:`show-role` now properly returns 403 Forbidden instead of 401 Unauthorized when you pass a working API token that doesn't have the right permission.
15+
- The content type for the ``schema.org`` dataset metadata export format has been corrected. It was ``application/json`` and now it is ``application/ld+json``. See also :ref:`export-dataset-metadata-api`.
1516

1617
v6.5
1718
----

Diff for: doc/sphinx-guides/source/api/native-api.rst

+44-9
Original file line numberDiff line numberDiff line change
@@ -1575,6 +1575,8 @@ Export Metadata of a Dataset in Various Formats
15751575

15761576
|CORS| Export the metadata of the current published version of a dataset in various formats.
15771577

1578+
To get a list of available formats, see :ref:`available-exporters` and :ref:`get-export-formats`.
1579+
15781580
See also :ref:`batch-exports-through-the-api` and the note below:
15791581

15801582
.. code-block:: bash
@@ -1591,9 +1593,30 @@ The fully expanded example above (without environment variables) looks like this
15911593
15921594
curl "https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/J8SJZB"
15931595
1594-
.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Descriptive names can be found under :ref:`metadata-export-formats` in the User Guide.
1596+
.. _available-exporters:
1597+
1598+
Available Dataset Metadata Exporters
1599+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1600+
1601+
The following dataset metadata exporters ship with Dataverse:
1602+
1603+
- ``Datacite``
1604+
- ``dataverse_json``
1605+
- ``dcterms``
1606+
- ``ddi``
1607+
- ``oai_datacite``
1608+
- ``oai_dc``
1609+
- ``oai_ddi``
1610+
- ``OAI_ORE``
1611+
- ``schema.org``
1612+
1613+
These are the strings to pass as ``$METADATA_FORMAT`` in the examples above. Descriptive names for each format can be found under :ref:`metadata-export-formats` in the User Guide.
1614+
1615+
Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. The machine-readable name/identifier for each external exporter can be found under :ref:`inventory-of-external-exporters`. If you are interested in creating your own exporter, see :doc:`/developers/metadataexport`.
1616+
1617+
To discover the machine-readable name of exporters (e.g. ``ddi``) that have been enabled on the installation of Dataverse you are using see :ref:`get-export-formats`. Alternatively, you can use the Signposting "linkset" API documented under :ref:`signposting-api`.
15951618

1596-
.. note:: Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. To discover the machine-readable name of each exporter (e.g. ``ddi``), check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.
1619+
To discover the machine-readable name of exporters generally, check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.
15971620

15981621
Schema.org JSON-LD
15991622
^^^^^^^^^^^^^^^^^^
@@ -1607,6 +1630,8 @@ Both forms are valid according to Google's Structured Data Testing Tool at https
16071630

16081631
The standard has further evolved into a format called Croissant. For details, see :ref:`schema.org-head` in the Admin Guide.
16091632

1633+
The ``schema.org`` format changed after Dataverse 6.4 as well. Previously its content type was "application/json" but now it is "application/ld+json".
1634+
16101635
List Files in a Dataset
16111636
~~~~~~~~~~~~~~~~~~~~~~~
16121637

@@ -3174,15 +3199,23 @@ Retrieve Signposting Information
31743199
Dataverse supports :ref:`discovery-sign-posting` as a discovery mechanism.
31753200
Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc5988>`__ HTTP header providing summary information on GET and HEAD requests to retrieve the dataset page and a separate /linkset API call to retrieve additional information.
31763201

3177-
Here is an example of a "Link" header:
3202+
Signposting Link HTTP Header
3203+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3204+
3205+
Here is an example of a HTTP "Link" header from a GET or HEAD request for a dataset landing page:
31783206

3179-
``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``
3207+
``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=Datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_dc&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json",<https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=dcterms&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=html&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="text/html",<https://demo.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=oai_ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <http://creativecommons.org/publicdomain/zero/1.0>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``
31803208

3181-
The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.
3209+
The URL for linkset information (described below) is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.
3210+
3211+
Signposting Linkset API Endpoint
3212+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
31823213

31833214
The reponse includes a JSON object conforming to the `Signposting <https://signposting.org>`__ specification. As part of this conformance, unlike most Dataverse API responses, the output is not wrapped in a ``{"status":"OK","data":{`` object.
31843215
Signposting is not supported for draft dataset versions.
31853216

3217+
Like :ref:`get-export-formats`, this API can be used to get URLs to dataset metadata export formats, but with URLs for the dataset in question.
3218+
31863219
.. code-block:: bash
31873220
31883221
export SERVER_URL=https://demo.dataverse.org
@@ -5182,12 +5215,14 @@ The fully expanded example above (without environment variables) looks like this
51825215
51835216
curl "https://demo.dataverse.org/api/info/settings/:MaxEmbargoDurationInMonths"
51845217
5185-
Get Export Formats
5186-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
5218+
.. _get-export-formats:
5219+
5220+
Get Dataset Metadata Export Formats
5221+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51875222
5188-
Get the available export formats, including custom formats.
5223+
Get the available dataset metadata export formats, including formats from external exporters (see :ref:`available-exporters`).
51895224
5190-
The response contains an object with available format names as keys, and as values an object with the following properties:
5225+
The response contains a JSON object with the available format names as keys (these can be passed to :ref:`export-dataset-metadata-api`), and values as objects with the following properties:
51915226
51925227
* ``displayName``
51935228
* ``mediaType``

Diff for: doc/sphinx-guides/source/user/dataset-management.rst

+2
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ Additional formats can be enabled. See :ref:`inventory-of-external-exporters` in
4343

4444
Each of these metadata exports contains the metadata of the most recently published version of the dataset.
4545

46+
For each dataset, links to each enabled metadata format are available programmatically via Signposting. For details, see :ref:`discovery-sign-posting` in the Admin Guide and :ref:`signposting-api` in the API Guide.
47+
4648
.. _adding-new-dataset:
4749

4850
Adding a New Dataset

Diff for: src/main/java/edu/harvard/iq/dataverse/export/SchemaDotOrgExporter.java

+5-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,11 @@ public Boolean isAvailableToUsers() {
111111

112112
@Override
113113
public String getMediaType() {
114-
return MediaType.APPLICATION_JSON;
114+
/**
115+
* Changed from "application/json" to "application/ld+json" because
116+
* that's what Signposting expects.
117+
*/
118+
return "application/ld+json";
115119
}
116120

117121
}

Diff for: src/main/java/edu/harvard/iq/dataverse/util/SignpostingResources.java

+39-13
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Two configurable options allow changing the limit for the number of authors or d
1616

1717
import edu.harvard.iq.dataverse.*;
1818
import edu.harvard.iq.dataverse.dataset.DatasetUtil;
19+
import edu.harvard.iq.dataverse.export.ExportService;
1920
import jakarta.json.Json;
2021
import jakarta.json.JsonArrayBuilder;
2122
import jakarta.json.JsonObjectBuilder;
@@ -28,6 +29,8 @@ Two configurable options allow changing the limit for the number of authors or d
2829
import java.util.logging.Logger;
2930

3031
import static edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder.jsonObjectBuilder;
32+
import io.gdcc.spi.export.ExportException;
33+
import io.gdcc.spi.export.Exporter;
3134

3235
public class SignpostingResources {
3336
private static final Logger logger = Logger.getLogger(SignpostingResources.class.getCanonicalName());
@@ -72,8 +75,17 @@ public String getLinks() {
7275
}
7376

7477
String describedby = "<" + ds.getGlobalId().asURL().toString() + ">;rel=\"describedby\"" + ";type=\"" + "application/vnd.citationstyles.csl+json\"";
75-
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId="
76-
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"application/ld+json\"";
78+
ExportService instance = ExportService.getInstance();
79+
for (String[] labels : instance.getExportersLabels()) {
80+
String formatName = labels[1];
81+
Exporter exporter;
82+
try {
83+
exporter = ExportService.getInstance().getExporter(formatName);
84+
describedby += ",<" + getExporterUrl(formatName, ds) + ">;rel=\"describedby\"" + ";type=\"" + exporter.getMediaType() + "\"";
85+
} catch (ExportException ex) {
86+
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
87+
}
88+
}
7789
valueList.add(describedby);
7890

7991
String type = "<https://schema.org/AboutPage>;rel=\"type\"";
@@ -85,7 +97,7 @@ public String getLinks() {
8597

8698
String linkset = "<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/:persistentId/versions/"
8799
+ workingDatasetVersion.getVersionNumber() + "." + workingDatasetVersion.getMinorVersionNumber()
88-
+ "/linkset?persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + "> ; rel=\"linkset\";type=\"application/linkset+json\"";
100+
+ "/linkset?persistentId=" + ds.getGlobalId().asString() + "> ; rel=\"linkset\";type=\"application/linkset+json\"";
89101
valueList.add(linkset);
90102
logger.fine(String.format("valueList is: %s", valueList));
91103

@@ -95,7 +107,7 @@ public String getLinks() {
95107
public JsonArrayBuilder getJsonLinkset() {
96108
Dataset ds = workingDatasetVersion.getDataset();
97109
GlobalId gid = ds.getGlobalId();
98-
String landingPage = systemConfig.getDataverseSiteUrl() + "/dataset.xhtml?persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier();
110+
String landingPage = systemConfig.getDataverseSiteUrl() + "/dataset.xhtml?persistentId=" + ds.getGlobalId().asString();
99111
JsonArrayBuilder authors = getJsonAuthors(getAuthorURLs(false));
100112
JsonArrayBuilder items = getJsonItems();
101113

@@ -112,15 +124,24 @@ public JsonArrayBuilder getJsonLinkset() {
112124
)
113125
);
114126

115-
mediaTypes.add(
116-
jsonObjectBuilder().add(
117-
"href",
118-
systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier()
119-
).add(
120-
"type",
121-
"application/ld+json"
122-
)
123-
);
127+
ExportService instance = ExportService.getInstance();
128+
for (String[] labels : instance.getExportersLabels()) {
129+
String formatName = labels[1];
130+
Exporter exporter;
131+
try {
132+
exporter = ExportService.getInstance().getExporter(formatName);
133+
mediaTypes.add(
134+
jsonObjectBuilder().add(
135+
"href", getExporterUrl(formatName, ds)
136+
).add(
137+
"type",
138+
exporter.getMediaType()
139+
)
140+
);
141+
} catch (ExportException ex) {
142+
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
143+
}
144+
}
124145
JsonArrayBuilder linksetJsonObj = Json.createArrayBuilder();
125146

126147
JsonObjectBuilder mandatory;
@@ -274,4 +295,9 @@ private String getPublicDownloadUrl(DataFile dataFile) {
274295
return FileUtil.getPublicDownloadUrl(systemConfig.getDataverseSiteUrl(),
275296
((gid != null) ? gid.asString() : null), dataFile.getId());
276297
}
298+
299+
private String getExporterUrl(String formatName, Dataset ds) {
300+
return systemConfig.getDataverseSiteUrl()
301+
+ "/api/datasets/export?exporter=" + formatName + "&persistentId=" + ds.getGlobalId().asString();
302+
}
277303
}

Diff for: src/test/java/edu/harvard/iq/dataverse/api/SignpostingIT.java

+32
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
import java.util.regex.Pattern;
1717

1818
import jakarta.json.JsonObject;
19+
import static org.hamcrest.CoreMatchers.endsWith;
20+
import static org.hamcrest.CoreMatchers.is;
1921

2022
import org.junit.jupiter.api.BeforeAll;
2123
import org.junit.jupiter.api.Test;
@@ -56,8 +58,21 @@ public void testSignposting() {
5658
String datasetLandingPage = RestAssured.baseURI + "/dataset.xhtml?persistentId=" + datasetPid;
5759
System.out.println("Checking dataset landing page for Signposting: " + datasetLandingPage);
5860
Response getHtml = given().get(datasetLandingPage);
61+
getHtml.then().assertThat()
62+
.statusCode(OK.getStatusCode())
63+
.header("Link", endsWith("linkset?persistentId=" + datasetPid + "> ; rel=\"linkset\";type=\"application/linkset+json\""));
5964

6065
System.out.println("Link header: " + getHtml.getHeader("Link"));
66+
if (false) {
67+
// Split on commas to make the output more readable.
68+
System.out.println("---");
69+
String header = getHtml.getHeader("Link");
70+
for (String string : header.split(",")) {
71+
System.out.println(string + ",");
72+
}
73+
System.out.println("returning early...");
74+
return;
75+
}
6176

6277
getHtml.then().assertThat().statusCode(OK.getStatusCode());
6378

@@ -67,6 +82,8 @@ public void testSignposting() {
6782
assertTrue(linkHeader.contains(datasetPid));
6883
assertTrue(linkHeader.contains("cite-as"));
6984
assertTrue(linkHeader.contains("describedby"));
85+
// Make sure we get more exporters besides just "schema.org".
86+
assertTrue(linkHeader.contains("oai_datacite"));
7087

7188
Response headHtml = given().head(datasetLandingPage);
7289

@@ -76,6 +93,7 @@ public void testSignposting() {
7693

7794
// Make sure there's Signposting stuff in the "Link" header such as
7895
// the dataset PID, cite-as, etc.
96+
// TODO: The comment above is a repeat and so are some of the assertions below. Consolidate?
7997
linkHeader = getHtml.getHeader("Link");
8098
assertTrue(linkHeader.contains(datasetPid));
8199
assertTrue(linkHeader.contains("cite-as"));
@@ -90,8 +108,15 @@ public void testSignposting() {
90108
System.out.println("Linkset URL: " + linksetUrl);
91109

92110
Response linksetResponse = given().accept(ContentType.JSON).get(linksetUrl);
111+
linksetResponse.prettyPrint();
112+
linksetResponse.then().assertThat()
113+
.statusCode(OK.getStatusCode())
114+
.body("linkset[0].anchor", endsWith("/dataset.xhtml?persistentId=" + datasetPid))
115+
.body("linkset[0].license.href", is("http://creativecommons.org/publicdomain/zero/1.0"))
116+
.body("linkset[0].describedby[1].href", endsWith("persistentId=" + datasetPid));
93117

94118
String responseString = linksetResponse.getBody().asString();
119+
System.out.println("response string: " + responseString);
95120

96121
JsonObject data = JsonUtil.getJsonObject(responseString);
97122
JsonObject lso = data.getJsonArray("linkset").getJsonObject(0);
@@ -107,6 +132,13 @@ public void testSignposting() {
107132
Pattern exporterPattern = Pattern.compile("[<\\[][^()\\[\\]]*?exporter=schema.org[^()\\[\\]]*[>\\]]");
108133
Matcher exporterMatcher = exporterPattern.matcher(linkHeader);
109134
exporterMatcher.find();
135+
// TODO: make an assertion
136+
//assertTrue(exporterMatcher.find());
137+
138+
// Test another
139+
Pattern exporterPattern2 = Pattern.compile("exporter=oai_datacite");
140+
Matcher exporterMatcher2 = exporterPattern2.matcher(linkHeader);
141+
assertTrue(exporterMatcher2.find());
110142

111143
Response exportDataset = UtilIT.exportDataset(datasetPid, "schema.org");
112144
exportDataset.prettyPrint();

Diff for: src/test/resources/json/export-formats.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
},
3737
"schema.org": {
3838
"displayName": "Schema.org JSON-LD",
39-
"mediaType": "application/json",
39+
"mediaType": "application/ld+json",
4040
"isHarvestable": false,
4141
"isVisibleInUserInterface": true
4242
},

0 commit comments

Comments
 (0)