Skip to content

REST API

Lenz Furrer edited this page Feb 7, 2020 · 9 revisions

REST API

OGER is run as a REST service on our institute server, using the following base URL:

https://pub.cl.uzh.ch/projects/ontogene/oger/

Command Overview

  • root: serve an HTML page for the web UI
  • status: status information of the whole service
  • dict: create/check/remove an annotation dictionary
  • fetch/upload: annotate a document

How to read the API

  • URL: Append the given path to the base URL. Variable portions are marked with a leading colon (eg. :dict_id needs to be replaced with an actual ID in a request).

  • Params: All parameters need to be sent as query parameters. For example, a fetch request could have the following path and query:

      /fetch/pubmed/tsv/21436587?single-section=true&include-header=true
    

    If the parameter value is a JSON object, make sure it is properly URL-encoded. If you make calls from a terminal, you may need to shell-escape certain characters (eg. use single quotes for the complete address to avoid & to be interpreted by the shell).

  • Payload: Payload of a POST request. If the payload is a JSON snippet, make sure you specify its MIME type. For example, a POST request with the Unix utility lwp-request could look as follows:

    echo '{"description": "Test vocabulary", "settings": {"termlist-path": "http://example.com/test/vocabulary.tsv"}}' | POST -c application/json https://pub.cl.uzh.ch/projects/ontogene/oger/dict

Root

Serve an HTML page providing a web user interface.

Name root
URL /
Method GET
Params dict=[hextoken] (optional)
Success 200 OK
Response text/html; charset=UTF-8

The optional URL parameter dict preselects the given dictionary in the dictionary list, as returned by the dict command.

Status

Check if the whole service is running.

Name status
URL /status
Method GET
Success 200 OK
Response application/json

The response body is a JSON snippet with the following structure:

{
	"status": "running",
	"active annotation dictionaries": <count (int)>,
	"default dictionary": <hextoken (string)>
}

The status value is always "running". If the service is not running, the connection is refused. The returned error code is outside the control of the API and depends on the status of the hosting infrastructure.

Dict

Create a dictionary for annotation.

Name dict
URL /dict
Method POST
Payload application/json
Success 202 Accepted
Response application/json
Errors 400 Bad Request

The creation/loading of a new dictionary may take some time. This request completes immediately and returns a hextoken, which can be used to check when the new dictionary is available.

The request payload is a JSON snippet with the following structure:

{
	"description": <short description (string)>,
	"settings": <termlist parameters (object)>
}

Available termlist parameters are listed here. Please note that some of the parameters require understanding how OGER performs entity matching. If a dictionary with the same parameters already exists, it is used instead of a new one (the description value has no effect in this case).

The response body is a JSON snippet with the following structure:

{
	"dict_id": <hextoken (string)>
}

The returned token can be used to check the status of the new dictionary and to specify it in annotation requests (fetch/upload).

Note: Currently, the number of non-default dictionaries is limited to 3. Adding new dictionaries beyond this limit results in older dictionaries being removed.

Check the status of an annotation dictionary.

Name dict-status
URL /dict/:dict_id/status
Method GET
Success 200 OK
Response application/json
Errors 404 Not Found

The response body is a JSON snippet with the following structure:

{
	"description": <short description (string)>,
	"status": <status (string)>
}

The returned status is one of "ready", "loading", or "crashed".

If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.

Remove an annotation dictionary.

Name dict-delete
URL /dict/:dict_id
Method DELETE
Success 200 OK
Response [empty]
Errors 403 Forbidden, 404 Not Found

If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If dict_id refers to the default dictionary, HTTP status code 403 (Forbidden) is returned.

Fetch/Upload

Annotate an article obtained from a remote repository (fetch) or sent by the client (upload).

Name fetch
URL /fetch/:source/:out_fmt/:doc_id
Method GET or POST
Params dict=[hextoken] (optional), postfilter=(true|false|frequentFP|longest_match|disambiguate) (optional), [format params...]
Success 200 OK
Response text/xml or application/json or text/tab-separated-values
Errors 400 Bad Request; 404 Not Found
Name upload
URL /upload/:in_fmt/:out_fmt[/:doc_id]
Method POST
Payload text/plain or text/xml or application/gzip
Params dict=[hextoken] (optional), postfilter=(true|false|frequentFP|longest_match|disambiguate) (optional), [format params...]
Success 200 OK
Response text/xml or application/json or text/tab-separated-values
Errors 400 Bad Request; 404 Not Found

Valid values for source, in_fmt, and out_fmt are listed below.

Specifying doc_id is mandatory for fetch calls, but optional for upload calls. If present, its value must be numeric. It is interpreted according to the specified source.

The optional URL parameter dict determines which dictionary to use for annotation. If unspecified, the default dictionary is used.

The URL parameter postfilter controls postprocessing of the document annotations. Multiple postfilters can be specified:

  • The "frequentFP" postfilter removes common cases of false positives (spurious annotations).
  • The "longest_match" postfilter removes entity mentions that are found inside the span of another entity of the same type.
  • The "disambiguate" postfilter attempts at removing unlikely entities based on a neural-network model trained on the CRAFT corpus.

The boolean values "true"/"false" enable/disable all postfilters at once. By default, all postfilters are enabled.

Further URL parameters may be given, which are interpreted as options to the source/in_fmt/out_fmt converters. The accepted parameters are listed below.

The format of the response body depends on the specified out_fmt. In an uploadcall, the format of the request body (payload) must match the specified in_fmt.

If an unknown source, in_fmt, out_fmt, or doc_id value is used, or if the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If the input cannot be processed (eg. because the input document has an invalid format), HTTP status code 400 (Bad Request) is returned.

Supported Formats

source value description
pubmed PubMed abstract obtained directly from NCBI.
pmc PubMed Central full-text article obtained directly from NCBI.
in_fmt value content-type description
txt text/plain unstructured plain-text document
bioc text/xml document or collection in BioC XML
bioc_json application/json document or collection in BioC JSON
pxml text/xml abstract in PubMed's citation XML
nxml text/xml article in PubMed Central's full-text XML
pxml.gz application/gzip compressed collection of abstracts in Medline's citation XML
out_fmt value content-type description
tsv text/tab-separated-values entities in a tab-separated table
xml text/xml entities in a simple, self-explanatory XML format
text_tsv text/tab-separated-values text and entities in a tab-separated table
bioc text/xml text and entities in BioC XML
bioc_json application/json text and entities in BioC JSON
pubanno_json application/json text and entities in PubAnnotator JSON
pubtator text/plain text and entities in PubTator format (mixture of pipe- and tab-separated text)
pubtator_fbk text/plain a variant of the above, with slightly different entity attributes
odin text/xml text and entities in ODIN XML
odin_custom text/xml text and entities in ODIN XML, with customisable CSS

Accepted options (given as URL parameters):
The options only affect the listed input/output formats.

option values source in_fmt out_fmt description
include-mesh true, false pubmed pxml, pxml.gz include the MeSH descriptor names as a separate section
mesh-as-entities true, false pubmed pxml, pxml.gz load the MeSH descriptors as annotations
single-section true, false pubmed txt, pxml, pxml.gz conflate all sections into one section
sentence-split true, false txt rely on given sentence splitting, one sentence per line
field-names JSON object bioc xml, bioc, bioc_json a mapping of field names, eg. {"original_id": "CID"}, used for renaming fields in the output and when importing annotations from BioC input
include-header true, false tsv print a header with column titles
sentence-level true, false bioc, bioc_json anchor text at passage (default) or sentence level?
bioc-meta JSON object bioc, bioc_json collection-level metadata (keys different from "source", "date", and "key" are put into <infon> elements)
byte-offsets-in true, false bioc, bioc_json interpret given offsets in bytes, not codepoints
byte-offsets-out true, false bioc, bioc_json measure offsets in bytes, not codepoints