-
Notifications
You must be signed in to change notification settings - Fork 8
REST API
OGER is run as a REST service on our institute server, using the following base URL:
https://pub.cl.uzh.ch/projects/ontogene/oger/
- root: serve an HTML page for the web UI
- status: status information of the whole service
- dict: create/check/remove an annotation dictionary
- fetch/upload: annotate a document
-
URL: Append the given path to the base URL. Variable portions are marked with a leading colon (eg.
:dict_id
needs to be replaced with an actual ID in a request). -
Params: All parameters need to be sent as query parameters. For example, a fetch request could have the following path and query:
/fetch/pubmed/tsv/21436587?single-section=true&include-header=true
If the parameter value is a JSON object, make sure it is properly URL-encoded. If you make calls from a terminal, you may need to shell-escape certain characters (eg. use single quotes for the complete address to avoid
&
to be interpreted by the shell). -
Payload: Payload of a POST request. If the payload is a JSON snippet, make sure you specify its MIME type. For example, a POST request with the Unix utility
lwp-request
could look as follows:echo '{"description": "Test vocabulary", "settings": {"termlist-path": "http://example.com/test/vocabulary.tsv"}}' | POST -c application/json https://pub.cl.uzh.ch/projects/ontogene/oger/dict
Name | root |
---|---|
URL | / |
Method | GET |
Params |
dict=[hextoken] (optional)
|
Success | 200 OK |
Response | text/html; charset=UTF-8 |
The optional URL parameter dict
preselects the given dictionary in the dictionary list, as returned by the dict command.
Name | status |
---|---|
URL | /status |
Method | GET |
Success | 200 OK |
Response | application/json |
The response body is a JSON snippet with the following structure:
{
"status": "running",
"active annotation dictionaries": <count (int)>,
"default dictionary": <hextoken (string)>
}
The status value is always "running". If the service is not running, the connection is refused. The returned error code is outside the control of the API and depends on the status of the hosting infrastructure.
Name | dict |
---|---|
URL | /dict |
Method | POST |
Payload | application/json |
Success | 202 Accepted |
Response | application/json |
Errors | 400 Bad Request |
The creation/loading of a new dictionary may take some time. This request completes immediately and returns a hextoken, which can be used to check when the new dictionary is available.
The request payload is a JSON snippet with the following structure:
{
"description": <short description (string)>,
"settings": <termlist parameters (object)>
}
Available termlist parameters are listed here. Please note that some of the parameters require understanding how OGER performs entity matching. If a dictionary with the same parameters already exists, it is used instead of a new one (the description value has no effect in this case).
The response body is a JSON snippet with the following structure:
{
"dict_id": <hextoken (string)>
}
The returned token can be used to check the status of the new dictionary and to specify it in annotation requests (fetch/upload).
Note: Currently, the number of non-default dictionaries is limited to 3. Adding new dictionaries beyond this limit results in older dictionaries being removed.
Name | dict-status |
---|---|
URL | /dict/:dict_id/status |
Method | GET |
Success | 200 OK |
Response | application/json |
Errors | 404 Not Found |
The response body is a JSON snippet with the following structure:
{
"description": <short description (string)>,
"status": <status (string)>
}
The returned status is one of "ready", "loading", or "crashed".
If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
Name | dict-delete |
---|---|
URL | /dict/:dict_id |
Method | DELETE |
Success | 200 OK |
Response | [empty] |
Errors | 403 Forbidden, 404 Not Found |
If the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If dict_id
refers to the default dictionary, HTTP status code 403 (Forbidden) is returned.
Name | fetch |
---|---|
URL | /fetch/:source/:out_fmt/:doc_id |
Method | GET or POST |
Params |
dict=[hextoken] (optional), postfilter=(true|false|frequentFP|longest_match|disambiguate) (optional), [format params...]
|
Success | 200 OK |
Response | text/xml or application/json or text/tab-separated-values |
Errors | 400 Bad Request; 404 Not Found |
Name | upload |
---|---|
URL | /upload/:in_fmt/:out_fmt[/:doc_id] |
Method | POST |
Payload | text/plain or text/xml or application/gzip |
Params |
dict=[hextoken] (optional), postfilter=(true|false|frequentFP|longest_match|disambiguate) (optional), [format params...]
|
Success | 200 OK |
Response | text/xml or application/json or text/tab-separated-values |
Errors | 400 Bad Request; 404 Not Found |
Valid values for source
, in_fmt
, and out_fmt
are listed below.
Specifying doc_id
is mandatory for fetch
calls, but optional for upload
calls.
If present, its value must be numeric.
It is interpreted according to the specified source
.
The optional URL parameter dict
determines which dictionary to use for annotation.
If unspecified, the default dictionary is used.
The URL parameter postfilter
controls postprocessing of the document annotations.
Multiple postfilters can be specified:
- The "frequentFP" postfilter removes common cases of false positives (spurious annotations).
- The "longest_match" postfilter removes entity mentions that are found inside the span of another entity of the same type.
- The "disambiguate" postfilter attempts at removing unlikely entities based on a neural-network model trained on the CRAFT corpus.
The boolean values "true"/"false" enable/disable all postfilters at once. By default, all postfilters are enabled.
Further URL parameters may be given, which are interpreted as options to the source
/in_fmt
/out_fmt
converters.
The accepted parameters are listed below.
The format of the response body depends on the specified out_fmt
.
In an upload
call, the format of the request body (payload) must match the specified in_fmt
.
If an unknown source
, in_fmt
, out_fmt
, or doc_id
value is used, or if the specified dictionary does not exist (anymore), HTTP status code 404 (Not Found) is returned.
If the input cannot be processed (eg. because the input document has an invalid format), HTTP status code 400 (Bad Request) is returned.
source value |
description |
---|---|
pubmed | PubMed abstract obtained directly from NCBI. |
pmc | PubMed Central full-text article obtained directly from NCBI. |
in_fmt value |
content-type | description |
---|---|---|
txt | text/plain | unstructured plain-text document |
bioc | text/xml | document or collection in BioC XML |
bioc_json | application/json | document or collection in BioC JSON |
pxml | text/xml | abstract in PubMed's citation XML |
nxml | text/xml | article in PubMed Central's full-text XML |
pxml.gz | application/gzip | compressed collection of abstracts in Medline's citation XML |
out_fmt value |
content-type | description |
---|---|---|
tsv | text/tab-separated-values | entities in a tab-separated table |
xml | text/xml | entities in a simple, self-explanatory XML format |
text_tsv | text/tab-separated-values | text and entities in a tab-separated table |
bioc | text/xml | text and entities in BioC XML |
bioc_json | application/json | text and entities in BioC JSON |
pubanno_json | application/json | text and entities in PubAnnotator JSON |
pubtator | text/plain | text and entities in PubTator format (mixture of pipe- and tab-separated text) |
pubtator_fbk | text/plain | a variant of the above, with slightly different entity attributes |
odin | text/xml | text and entities in ODIN XML |
odin_custom | text/xml | text and entities in ODIN XML, with customisable CSS |
Accepted options (given as URL parameters):
The options only affect the listed input/output formats.
option | values | source |
in_fmt |
out_fmt |
description |
---|---|---|---|---|---|
include-mesh | true, false | pubmed | pxml, pxml.gz | include the MeSH descriptor names as a separate section | |
mesh-as-entities | true, false | pubmed | pxml, pxml.gz | load the MeSH descriptors as annotations | |
single-section | true, false | pubmed | txt, pxml, pxml.gz | conflate all sections into one section | |
sentence-split | true, false | txt | rely on given sentence splitting, one sentence per line | ||
field-names | JSON object | bioc | xml, bioc, bioc_json | a mapping of field names, eg. {"original_id": "CID"} , used for renaming fields in the output and when importing annotations from BioC input |
|
include-header | true, false | tsv | print a header with column titles | ||
sentence-level | true, false | bioc, bioc_json | anchor text at passage (default) or sentence level? | ||
bioc-meta | JSON object | bioc, bioc_json | collection-level metadata (keys different from "source", "date", and "key" are put into <infon> elements) | ||
byte-offsets-in | true, false | bioc, bioc_json | interpret given offsets in bytes, not codepoints | ||
byte-offsets-out | true, false | bioc, bioc_json | measure offsets in bytes, not codepoints |