Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overly complex response meta parameters should be optional #55

Open
mbaudis opened this issue Dec 22, 2022 · 5 comments
Open

Overly complex response meta parameters should be optional #55

mbaudis opened this issue Dec 22, 2022 · 5 comments
Labels
Dev scout enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@mbaudis
Copy link
Member

mbaudis commented Dec 22, 2022

Beacon v2 responses are very chatty & require many parameters in their meta elements which only serve verification purposes (e.g. that the beacon could interpret the query ...). This, however, makes it rather hard to demonstrate the simplicity of especially v2 Boolean and count responses. Below is the minimal response example for a variant query which I've used in the documentation - all parameters are required:

{
  "meta": {
    "apiVersion": "v2.0.0",
    "beaconId": "org.progenetix.beacon",
    "receivedRequestSummary": {
      "apiVersion": "v2.0.0",
      "pagination": {
        "limit": 2000,
        "skip": 0
      },
      "requestedGranularity": "boolean",
      "requestedSchemas": [
        {
          "entityType": "genomicVariant",
          "schema": "https://progenetix.org/services/schemas/genomicVariant/"
        }
      ],
      "requestParameters": {
        "alternateBases": "A",
        "referenceBases": "G",
        "referenceName": "refseq:NC_000017.11",
        "start": [
          7577120
        ]
      }
    },
    "returnedGranularity": "boolean",
    "returnedSchemas": [
      {
        "entityType": "genomicVariant",
        "schema": "https://progenetix.org/services/schemas/genomicVariant/"
      }
    ]
  },
  "responseSummary": {
    "exists": true
  }
}

Not necessary:

  • receivedRequestSummary.apiVersion, receivedRequestSummary.requestedSchemas, requestedGranularity are known by the client and the beacon's returned values are stated separately
  • receivedRequestSummary.pagination doesn't make sense for Boolean and count responses since it is for record pagination

Maybe, but:

receivedRequestSummary.requestedSchemas and returnedSchemas basically point to the requirement of having a schema for the entity. However, when e.g. doing the basic "beacon from a VCF" - which at least for SNVs is well understood - there is no inherent variant schema to point to, based on the source data.

Helpful for debugging:

  • receivedRequestSummary.requestParameters

So while it is obviously beneficial to have a full framework + default/alternative model implementation, the current use of required parameters makes it overly complex & scary for resource owners wanting to add a "v1-style" beacon (and thereby enriching the landscape for aggregators etc.). Minimal response IMO (and receivedRequestSummary is an edge case here since it won't be interpreted, mostly):

{
  "meta": {
    "apiVersion": "v2.0.0",
    "beaconId": "org.progenetix.beacon",
    "entityType": "genomicVariant",
    "receivedRequestSummary": {
      "requestParameters": {
        "alternateBases": "A",
        "referenceBases": "G",
        "referenceName": "refseq:NC_000017.11",
        "start": [
          7577120
        ]
      }
    },
    "returnedGranularity": "boolean"
  },
  "responseSummary": {
    "exists": true
  }
}

Discuss!

@mbaudis mbaudis added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Dec 22, 2022
@gsfk
Copy link
Collaborator

gsfk commented Jan 9, 2023

I agree that meta is overly verbose, I'd be happy for all of it to be optional.

Much of the bulk of the v2 spec seems aimed at full-response beacons, so those of us working with counts run up against the occasional odd requirement:

  • returnedSchemas isn't particularly meaningful, since no actual schema is returned,
  • it's not clear why a count beacon would bother with requestedGranularity, the only meaningful option would be to downgrade to boolean, which users can do themselves by reading the required field responseSummary.exists.

... and so on.

@jrambla
Copy link
Contributor

jrambla commented Apr 3, 2024

The rationale behind the verbosity:

  • The design is meant for a network scenario, where requests are dispatched to several beacons and responses could come back at different paces. Indeed some responses could come back after new requests have been issued.
  • Also, I've tried to make the design as stateless as possible, hence no query ids or queries in multiple steps
  • Additionally, I've tried to avoid having to parse the response body if, by inspecting the meta, the client could take decissions in advance (see below for more details)
  • In addition, in case no parameter is included in the request and the default value is used, the "requested" in the meta should show what the server has understood as "default value", allowing for additional safety checks on both sides.
  • I agree that the client must have some control on which responses belong to which requests, but I've tried to make the responses idempotent (like), hence the response is totally informative and it could travel "independently" of implementations or any other circumstance.
  • The attributes are mandatory to avoid many beacons not including them and, thus, destroying why they are there (see above)
  • I believe that adding the received attributes is just copying back the parameters in the HTTPRequest, or its parsing, back into the JSON response. Shouldn't be a big issue, IMHO, to tamper the goals above. The price to pay seemed very cheap to me.
  • Last but not least, these verbosity is specially useful when you are trying to spot an issue in a heavy load or heavily distributed scenario
  • Finally, the design was done with "expansion w/o breaking compatibility" in mind, e.g. if we add the "aggregated" response type, the framework basics must me resilient to it. (I hope now it is)

On the returnedGranularity, the rationale is similar;

  • In an scenario (network or not), the client could request a given granularity and the server return a different one.
  • We should not assume that the change in granularity is always down, it could be up (although unusual).
  • I agree that the response payload could tell the client what it got back.
  • However, as pointed above, the goal is for the client to be able to take decisions just looking at the meta w/o having to parse and interpret the body content. E.g. if the body has "count" and "no records" the response must be "count". In a response per dataset, then you will need to pick into every dataset to confirm that. My fear was having to document all the combinations, everyone having to understand such logics or, worse, everyone applying different logics.
  • Being verbose solve these issues at once
  • Additionally, In my mind, having the information in the meta would facilitate the developers leveraging that (using the less resistance path), as it makes their life easier. E,g. "If I really need counts and I got boolean, then I discard such beacon response"

In summary, the goals are:

  1. Response self contained and idempotent
  2. Stateless-y
  3. Using the meta instead parsing the body
  4. Making it mandatory to avoid beacon lazyness that tampers the points above
  5. Helping in debugging when in trouble

Having said so, some of the points need improvement, and the returnedSchema in boolean or count responses could be one of them.

@jrambla
Copy link
Contributor

jrambla commented Oct 10, 2024

At the Dev Scout meeting finished a few minutes ago we agreed in two main principles regarding this issue:

  1. Keep the properties mandatory, but...
  2. Make them mandatory only in cases where it makes sense to have it (e.g. "don't make a returnedSchema mandatory if no schema applies, but if it apply make it mandatory").
    We will work in a proposal for this.

@alaric-rd
Copy link

Perhaps related: in the implementation I'm working on, we internally map a geneId genomic query into a range query for passing to the database (by consulting a reference database of genes, and adding a certain number of base pairs "padding" on either side). It would be nice to return, in the response, what coordinates we translated the gene name into and actually searched for variants in! Obviously we should return the original info the client sent in the receivedRequestSummary, so for now I'm tempted to put the referenceName/start/end we map the gene into in an info somewhere; but it might be useful to have a standard place to put this stuff, in the same vein as requested/returned granularity.

@mbaudis
Copy link
Member Author

mbaudis commented Feb 10, 2025

@alaric-rd Yes, we do the same for geneId in the bycon stack. A problem with the reporting back is that this is all "info" level - other implementations wouldn't match on the gene's CDR but on the annotated name etc. So while it can be very informative to indicate what query you've done this is then also very linked to tyour overall stack and internal data model, and it isn't trivial to find a common format. I'm not against it - but start with meta.info I guess... (We had the expanded queries for a while in the header but now only activate for debugging since I don't want to confuse people ...)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev scout enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants