Inconsistent behaviour of VEP POST endpoint #675

DSuveges · 2025-01-16T15:09:43Z

Hi VEP Team!

I found the behaviour of the VEP POST endpoint somewhat inconsistent. Please take a look at this snippet:

import requests

def get_vep(rsids):
    url = "https://rest.ensembl.org/vep/human/id"
    headers = { "Content-Type" : "application/json", "Accept" : "application/json"}
    r = requests.post(url, headers=headers, json={"ids": rsids})
    try:
        print([x['input'] for x in r.json()])
    except:
        print(repr(r.json()))

# Example 1:
get_vep(['rs2345', 'rs123']) 
# ['rs2345', 'rs123'] <- Looks as expected

# Example 2:
get_vep(['rs1057517146', 'rs123'])
# ['rs123'] <- What about "rs1057517146"

# Example 3:
get_vep(['rs1057517146', 'rs75660264'])
# {'error': "No variant found with ID 'rs1057517146'"} <- What about "rs75660264"

In Example 1, two rsIds are submitted and the returned data contains VEP data for both IDs (list of dictionaries). This is the canonical behaviour, and would expect the API behave the same way every time, unless there is some fundamental problem with execution of the query eg. malformed payload, server error etc.

However in Example 2, the returned data (which is still a list of dictionaries) contains only one of the two submitted rsIDs. What has happened to the second one? For data provenance it would be absolutely important to know that that rsID was not found.

Example 3 is even more confusing: there are to input rsIDs, however the returned data is a dictionary informing about one of the variants was not found. What about the second variant? Why the returned data is not a list of dictionaries? (Also, I don't see why the status code of this query is 400. Error code 400 indicates "Bad Request", but in my opinion, the request is good, there was no problem running the lookup, which did not return data. But this is really a question of taste)

Proposed behaviour:

To me it would make more sense if the response would be always consistent: the returned data would be always a list of dictionaries, with equal number of elements to the input data.

For Example 2: for the sake of traceable data provenance, it would be highly valuable to know that which of the provided variants are not found in the database:

[
  {
    "input": "rs123"
  },
  {
    "error": "No variant found with ID 'rs1057517146'"
  }
]

For Example 3: For similar fashion, the API response would inform the user that none of submitted rsids were found:

[
  {
    "error": "No variant found with ID 'rs1057517146'"
  },
  {
    "error": "No variant found with ID 'rs75660264'"
  }
]

Think about a downstream application submitting POST requests and processing the response. Submits 200 rsIDs, gets back 200 dictionaries. Checks for the key "error" in the dictionary, if there's an error, it either logs the issue with the variant or drops it, depending on the usecase, but that's it.

I know this is the worst possible kind of user input a team might get, but I would be very happy if you would consider addressing it. Having a consistent API is super helpful from the user point of view. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behaviour of VEP POST endpoint #675

Inconsistent behaviour of VEP POST endpoint #675

DSuveges commented Jan 16, 2025

Inconsistent behaviour of VEP POST endpoint #675

Inconsistent behaviour of VEP POST endpoint #675

Comments

DSuveges commented Jan 16, 2025

Proposed behaviour: