Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour of VEP POST endpoint #675

Open
DSuveges opened this issue Jan 16, 2025 · 0 comments
Open

Inconsistent behaviour of VEP POST endpoint #675

DSuveges opened this issue Jan 16, 2025 · 0 comments

Comments

@DSuveges
Copy link

Hi VEP Team!

I found the behaviour of the VEP POST endpoint somewhat inconsistent. Please take a look at this snippet:

import requests

def get_vep(rsids):
    url = "https://rest.ensembl.org/vep/human/id"
    headers = { "Content-Type" : "application/json", "Accept" : "application/json"}
    r = requests.post(url, headers=headers, json={"ids": rsids})
    try:
        print([x['input'] for x in r.json()])
    except:
        print(repr(r.json()))

# Example 1:
get_vep(['rs2345', 'rs123']) 
# ['rs2345', 'rs123'] <- Looks as expected

# Example 2:
get_vep(['rs1057517146', 'rs123'])
# ['rs123'] <- What about "rs1057517146"

# Example 3:
get_vep(['rs1057517146', 'rs75660264'])
# {'error': "No variant found with ID 'rs1057517146'"} <- What about "rs75660264"

In Example 1, two rsIds are submitted and the returned data contains VEP data for both IDs (list of dictionaries). This is the canonical behaviour, and would expect the API behave the same way every time, unless there is some fundamental problem with execution of the query eg. malformed payload, server error etc.

However in Example 2, the returned data (which is still a list of dictionaries) contains only one of the two submitted rsIDs. What has happened to the second one? For data provenance it would be absolutely important to know that that rsID was not found.

Example 3 is even more confusing: there are to input rsIDs, however the returned data is a dictionary informing about one of the variants was not found. What about the second variant? Why the returned data is not a list of dictionaries? (Also, I don't see why the status code of this query is 400. Error code 400 indicates "Bad Request", but in my opinion, the request is good, there was no problem running the lookup, which did not return data. But this is really a question of taste)

Proposed behaviour:

To me it would make more sense if the response would be always consistent: the returned data would be always a list of dictionaries, with equal number of elements to the input data.

  • For Example 2: for the sake of traceable data provenance, it would be highly valuable to know that which of the provided variants are not found in the database:
[
  {
    "input": "rs123"
  },
  {
    "error": "No variant found with ID 'rs1057517146'"
  }
]
  • For Example 3: For similar fashion, the API response would inform the user that none of submitted rsids were found:
[
  {
    "error": "No variant found with ID 'rs1057517146'"
  },
  {
    "error": "No variant found with ID 'rs75660264'"
  }
]

Think about a downstream application submitting POST requests and processing the response. Submits 200 rsIDs, gets back 200 dictionaries. Checks for the key "error" in the dictionary, if there's an error, it either logs the issue with the variant or drops it, depending on the usecase, but that's it.

I know this is the worst possible kind of user input a team might get, but I would be very happy if you would consider addressing it. Having a consistent API is super helpful from the user point of view. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant