REST API

OGER is run as a REST service on our institute server, using the following base URL:

https://pub.cl.uzh.ch/projects/ontogene/oger/

Command Overview

root: serve an HTML page for the web UI
status: status information of the whole service
dict: create/check/remove an annotation dictionary
fetch/upload: annotate a document

How to read the API

URL: Append the given path to the base URL. Variable portions are marked with a leading colon (eg. :dict_id needs to be replaced with an actual ID in a request).
Params: All parameters need to be sent as query parameters. For example, a fetch request could have the following path and query:
```
  /fetch/pubmed/tsv/21436587?single-section=true&include-header=true
```
If the parameter value is a JSON object, make sure it is properly URL-encoded. If you make calls from a terminal, you may need to shell-escape certain characters (eg. use single quotes for the complete address to avoid & to be interpreted by the shell).

Payload: Payload of a POST request. If the payload is a JSON snippet, make sure you specify its MIME type. For example, a POST request with the Unix utility lwp-request could look as follows:

echo '{"description": "Test vocabulary", "settings": {"termlist-path": "http://example.com/test/vocabulary.tsv"}}' | POST -c application/json https://pub.cl.uzh.ch/projects/ontogene/oger/dict

Root

Serve an HTML page providing a web user interface.

Name	root
URL	`/`
Method	GET
Params	`dict=[hextoken]` (optional)
Success	200 OK
Response	text/html; charset=UTF-8

The optional URL parameter dict preselects the given dictionary in the dictionary list, as returned by the dict command.

Status

Check if the whole service is running.

Name	status
URL	`/status`
Method	GET
Success	200 OK
Response	application/json

The response body is a JSON snippet with the following structure:

{
	"status": "running",
	"active annotation dictionaries": <count (int)>,
	"default dictionary": <hextoken (string)>
}

The status value is always "running". If the service is not running, the connection is refused. The returned error code is outside the control of the API and depends on the status of the hosting infrastructure.

Dict

Create a dictionary for annotation.

Name	dict
URL	`/dict`
Method	POST
Payload	application/json
Success	202 Accepted
Response	application/json
Errors	400 Bad Request

The creation/loading of a new dictionary may take some time. This request completes immediately and returns a hextoken, which can be used to check when the new dictionary is available.

The request payload is a JSON snippet with the following structure:

{
	"description": <short description (string)>,
	"settings": <termlist parameters (object)>
}

Available termlist parameters are listed here. Please note that some of the parameters require understanding how OGER performs entity matching. If a dictionary with the same parameters already exists, it is used instead of a new one (the description value has no effect in this case).

The response body is a JSON snippet with the following structure:

{
	"dict_id": <hextoken (string)>
}

The returned token can be used to check the status of the new dictionary and to specify it in annotation requests (fetch/upload).

Note: Currently, the number of non-default dictionaries is limited to 3. Adding new dictionaries beyond this limit results in older dictionaries being removed.

Check the status of an annotation dictionary.

Name	dict-status
URL	`/dict/:dict_id/status`
Method	GET
Success	200 OK
Response	application/json
Errors	404 Not Found

The response body is a JSON snippet with the following structure:

{
	"description": <short description (string)>,
	"status": <status (string)>
}

The returned status is one of "ready", "loading", or "crashed".

If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.

Remove an annotation dictionary.

Name	dict-delete
URL	`/dict/:dict_id`
Method	DELETE
Success	200 OK
Response	[empty]
Errors	403 Forbidden, 404 Not Found

If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If dict_id refers to the default dictionary, HTTP status code 403 (Forbidden) is returned.

Fetch/Upload

Annotate an article obtained from a remote repository (fetch) or sent by the client (upload).

Name	fetch
URL	`/fetch/:source/:out_fmt/:doc_id`
Method	GET or POST
Params	`dict=[hextoken]` (optional), `postfilter=(true\|false\|frequentFP\|longest_match\|disambiguate)` (optional), [format params...]
Success	200 OK
Response	text/xml or application/json or text/tab-separated-values
Errors	400 Bad Request; 404 Not Found

Name	upload
URL	`/upload/:in_fmt/:out_fmt[/:doc_id]`
Method	POST
Payload	text/plain or text/xml or application/gzip
Params	`dict=[hextoken]` (optional), `postfilter=(true\|false\|frequentFP\|longest_match\|disambiguate)` (optional), [format params...]
Success	200 OK
Response	text/xml or application/json or text/tab-separated-values
Errors	400 Bad Request; 404 Not Found

Valid values for source, in_fmt, and out_fmt are listed below.

Specifying doc_id is mandatory for fetch calls, but optional for upload calls. If present, its value must be numeric. It is interpreted according to the specified source.

The optional URL parameter dict determines which dictionary to use for annotation. If unspecified, the default dictionary is used.

The URL parameter postfilter controls postprocessing of the document annotations. Multiple postfilters can be specified:

The "frequentFP" postfilter removes common cases of false positives (spurious annotations).
The "longest_match" postfilter removes entity mentions that are found inside the span of another entity of the same type.
The "disambiguate" postfilter attempts at removing unlikely entities based on a neural-network model trained on the CRAFT corpus.

The boolean values "true"/"false" enable/disable all postfilters at once. By default, all postfilters are enabled.

Further URL parameters may be given, which are interpreted as options to the source/in_fmt/out_fmt converters. The accepted parameters are listed below.

The format of the response body depends on the specified out_fmt. In an uploadcall, the format of the request body (payload) must match the specified in_fmt.

If an unknown source, in_fmt, out_fmt, or doc_id value is used, or if the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If the input cannot be processed (eg. because the input document has an invalid format), HTTP status code 400 (Bad Request) is returned.

Supported Formats

`source` value	description
pubmed	PubMed abstract obtained directly from NCBI.
pmc	PubMed Central full-text article obtained directly from NCBI.

`in_fmt` value	content-type	description
txt	text/plain	unstructured plain-text document
bioc	text/xml	document or collection in BioC XML
bioc_json	application/json	document or collection in BioC JSON
pxml	text/xml	abstract in PubMed's citation XML
nxml	text/xml	article in PubMed Central's full-text XML
pxml.gz	application/gzip	compressed collection of abstracts in Medline's citation XML

`out_fmt` value	content-type	description
tsv	text/tab-separated-values	entities in a tab-separated table
xml	text/xml	entities in a simple, self-explanatory XML format
text_tsv	text/tab-separated-values	text and entities in a tab-separated table
bioc	text/xml	text and entities in BioC XML
bioc_json	application/json	text and entities in BioC JSON
pubanno_json	application/json	text and entities in PubAnnotator JSON
pubtator	text/plain	text and entities in PubTator format (mixture of pipe- and tab-separated text)
pubtator_fbk	text/plain	a variant of the above, with slightly different entity attributes
odin	text/xml	text and entities in ODIN XML
odin_custom	text/xml	text and entities in ODIN XML, with customisable CSS

Accepted options (given as URL parameters):
The options only affect the listed input/output formats.

option	values	`source`	`in_fmt`	`out_fmt`	description
include-mesh	true, false	pubmed	pxml, pxml.gz		include the MeSH descriptor names as a separate section
mesh-as-entities	true, false	pubmed	pxml, pxml.gz		load the MeSH descriptors as annotations
single-section	true, false	pubmed	txt, pxml, pxml.gz		conflate all sections into one section
sentence-split	true, false		txt		rely on given sentence splitting, one sentence per line
field-names	JSON object		bioc	xml, bioc, bioc_json	a mapping of field names, eg. `{"original_id": "CID"}`, used for renaming fields in the output and when importing annotations from BioC input
include-header	true, false			tsv	print a header with column titles
sentence-level	true, false			bioc, bioc_json	anchor text at passage (default) or sentence level?
bioc-meta	JSON object			bioc, bioc_json	collection-level metadata (keys different from "source", "date", and "key" are put into <infon> elements)
byte-offsets-in	true, false		bioc, bioc_json		interpret given offsets in bytes, not codepoints
byte-offsets-out	true, false			bioc, bioc_json	measure offsets in bytes, not codepoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

REST API

REST API

Command Overview

How to read the API

Root

Serve an HTML page providing a web user interface.

Status

Check if the whole service is running.

Dict

Create a dictionary for annotation.

Check the status of an annotation dictionary.

Remove an annotation dictionary.

Fetch/Upload

Annotate an article obtained from a remote repository (fetch) or sent by the client (upload).

Supported Formats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally