When does query complexity live? #68

cbizon · 2022-01-19T14:03:45Z

cbizon
Jan 19, 2022
Maintainer

Should we continue to expand the expressiveness of the query graph or rely on the programmatic actions available in workflows?
We want to have more control, the question is how to achieve it and where does it occur?

As a specific example, consider a NOT parameter. Should that be handled by extending TRAPI to include nots, or should it be handled by implementing filter operations and counting on workflows to implement joins?

remontoire-pac · 2022-01-19T14:10:49Z

remontoire-pac
Jan 19, 2022

Which one will make it easier to: - expand to additional Booleans like OR, XOR, AND - process lists as "complete units" in workflows (e.g. for similarity/enrichment analyses) - preserve maximum flexibility on how individual translator components are used (e.g., sub-KP-level services)

…

On Wed, Jan 19, 2022 at 9:03 AM cbizon ***@***.***> wrote: Should we continue to expand the expressiveness of the query graph or rely on the programmatic actions available in workflows? We want to have more control, the question is how to achieve it and where does it occur? As a specific example, consider a NOT parameter. Should that be handled by extending TRAPI to include nots, or should it be handled by implementing filter operations and counting on workflows to implement joins? — Reply to this email directly, view it on GitHub <#68>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQFWTRUXUXDT45WQEEZVHLUW3AE3ANCNFSM5MKAV6UA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

0 replies

edeutsch · 2022-01-19T16:52:21Z

edeutsch
Jan 19, 2022
Collaborator

I think it would be good to expand the expressiveness of the QueryGraph to include the ability to encode more complex questions, including aspects of negation, exceptions/exclusions, qualifiers, and context. I think a 1-1 translation of complex English questions into a complex QueryGraph would be a beneficial thing, rather than having to translate a complex English question into a partial QueryGraph plus a workflow that handles the more complex components of the question.

I think it would be reasonable if KPs would not implement these complexities, but rather the ARAs would implement these and selectively pass the simplified subset QueryGraphs to the KPs for their parts.

It seems like it would be relatively straightforward to implement a mechanism for KPs and ARAs to describe which complexities they support and which they do not.

0 replies

saramsey · 2022-01-19T22:40:06Z

saramsey
Jan 19, 2022

I favor including "exclude" in the TRAPI query edge. It better incorporates the user's overall intent in the query graph and preserves the advantages of a more declarative approach, leaving it up to the ARA to determine how best to satisfy the query rather than having to process a workflow, discern the intent from the workflow, and then modify the workflow to best satisfy the user's query intent given the functional capabilities provided by the particular ARA system.

0 replies

saramsey · 2022-01-19T22:48:50Z

saramsey
Jan 19, 2022

FWIW, RTX-KG2 does not provide edges with "NOT" negation, as far as I know.

0 replies

saramsey · 2022-01-19T23:06:18Z

saramsey
Jan 19, 2022

Responding to the proposal to support logical OR: if the idea is to have a query expressing that a user wants triples that satisfy (this is kind of TRAPI pseudocode below):

Gene1 -[predicate_x]-> Phenotype2

or

Gene1 -[predicate_y]-> Phenotype2

we can already do that, using a single query edge with two predicates,

Gene1 -[predicate_x,predicate_y]-> Phenotype2

Similarly, regarding the idea to support logical AND: if the idea is to have a query expressing that we want triples that satisfy (this is kind of TRAPI pseudocode below):

Gene1 -[predicate_x]-> Phenotype2

and

Gene1 -[predicate_y]-> Phenotype2

I believe we can do that as well, already. Just have two query edges in the query graph. For "AND NOT", that's basically a two-query-edge graph with one of the edges having the "exclude" attribute, which ARAX already supports and which I support making into a standard.

I feel that more unusual relationships like XOR should be supported in the Workflow, using a filter operation. I would need some convincing that "predicate XOR" is a mainstream use-case for Translator.

As for "NOT" meaning that we are looking for a negated edge,

Gene1 -[NOT predicate_x]-> Phenotype2

I think triples asserting things like "Gene X does not regulate Pathway Y" are generally pretty context-specific and I would argue they are not so helpful for reasoning. Anyhow, RTX-KG2 doesn't include such edges as far as I recall.

0 replies

vdancik · 2022-01-19T23:15:16Z

vdancik
Jan 19, 2022
Collaborator

I believe that any functionality, like "exclude", that can be achieve via a suitable operation/workflow should not be a part of core TRAPI specification. Otherwise we will have dozen groups implementing the same functionality and that is both inefficient and error-prone.

2 replies

saramsey Jan 19, 2022

I believe that any functionality, like "exclude", that can be achieve via a suitable operation/workflow should not be a part of core TRAPI specification. Otherwise we will have dozen groups implementing the same functionality and that is both inefficient and error-prone.

Perhaps it can be an optional part of the TRAPI spec that is only implemented by ARAs (KPs can give a 422 status code and a polite JSON response if the query graph requests such a feature and the KP doesn't implement it). I note that the ARAs would ultimately have to implement this capability---in some form or other---regardless of whether it is decided that it should be encoded in the workflow or in the query graph. But making it part of the workflow may make it harder for groups that don't opt to handle it by explicitly post-filtering on the excluded predicate. And as I wrote above, I favor the user's intent for this important class of queries to be made clear in the query graph.

In any event, I think what we don't want is to have a situation where to find chemical entities that target amyloid and that are not listed anywhere as contraindicated-for Alzheimers, one would have to POST a three-qnode, two-qedge query graph like this:

drug --[physically_interacts_with]--> amyloid
drug --[contraindicated_for]--> Alzheimer's

with only the workflow indicating that we actually don't want drugs that satisfy the second triple. That would be sub-optimal, right(?), because the query graph would by itself be misleading as to the user's intent.

Bottom line, whatever we do, I propose we not have the query graph be misleading as to the user's intent, such that the workflow must be consulted in order to discern that actually, one of those query edges is excluded. Not sure if that is what you were proposing, but I just want to put out there that I think that would not be optimal.

dkoslicki Jan 25, 2022

I believe that any functionality, like "exclude", that can be achieve via a suitable operation/workflow should not be a part of core TRAPI specification. Otherwise we will have dozen groups implementing the same functionality and that is both inefficient and error-prone.

I don't think the "not" (or "exclude") part would be a duplication of effort: different ARAs or KPs might have different lines of evidence for "not's" or "excludes". For example, "find me all drugs that can treat X but which are not cytotoxic". KP/ARA 1 might have cytotoxicity encoded as a binary yes/no, KP/ARA 2 might have this encoded via LDH assay, and KP/ARA 3 might have this encoded via SRB assay.

Similarly, one KP/ARA might have A-[related_to]>B1-[related_to]>C and A->B2-[not_related_to]>C (where not_related_to means that edge does not exist in that KP/ARA), however, a different KP/ARA might have some additional knowledge source that has A->B2-[related_to]>C. We would want to communicate this nuance to the user: KP foo says B2 is not related to C, but KP bar says it is.

dkoslicki · 2022-01-25T22:02:45Z

dkoslicki
Jan 25, 2022

From the 1/25 workflow working group meeting:

Some exclude queries would not be possible with a workflow. Example query – find me all drugs that are not cytotoxic: you would need to find all drugs and then filter out ones identified to be cytotoxic.
I.e. Doing this with a workflow is equivalent to “collect the world, throw out what you don’t want”
This would require: A->B but not B->C would require: look up all A->B, look up all B->C, then filter results of B’s that connect to C’s. This is not possible in the workflow runner as it would require changing the query graph.
Much more suitable and natural in TRAPI
“Not” needs to exist in the query language, TRAPI is the query language (hence it needs not)

1 reply

saramsey Jan 26, 2022

I agree with @dkoslicki

webyrd · 2022-01-25T23:47:43Z

webyrd
Jan 25, 2022
Collaborator

I am close to @dkoslicki 's thinking. We should avoid 'generate & filter' of giant results sets, and multiple workflow queries, if possible. Better to have the operations at the TRAPI level.

0 replies

cbizon · 2022-01-26T14:21:04Z

cbizon
Jan 26, 2022
Maintainer Author

I am on board with putting NOT in some form into TRAPI. That doesn't exclude creating operations for filtering as well.

I do wonder if we require KPs to implement it, though. Maybe we can handle at the operation level - there might be a core one-hop lookup, and a more generic lookup that handles everything in TRAPI.

Fundamentally, I think that NOT functionality makes clearer the cost of a fully federated system. If the query is eg. A-[p1]->B excluding A-[p2]->B, I have no guarantee that p1 and p2 will both occur in the same KP. Furthermore, there is no way, I think, to expose enough information in endpoint metadata to allow that to be known for an arbitrary pair of A,B.

That means that no matter what, an ARA will have to issue broadly queries for A-[p1]->B, then collect all the B and then broadcast a query for A-[p2]->B and do filtering at its level. (Unless an ARA is only using its local integrated db). Therefore, for most KPs, there's not much benefit in implementing NOT functionality.

Note that this same thinking also applies to any acquisition of node properties, for filtering or other purposes. Unless we can guarantee that every KP will provide the same properties, then extra calls will be required to fill those properties in.

1 reply

cbizon Jan 26, 2022
Maintainer Author

Note that per @edeutsch this comment is really about EXCLUDE functionality rather than NOT.

edeutsch · 2022-01-26T16:45:52Z

edeutsch
Jan 26, 2022
Collaborator

I think there is a big difference between NOT and EXCLUDE and I think they are being conflated here perhaps?

In your example above, "eg. A-[p1]->B excluding A-[p2]->B", this is the EXCLUDE aspect.

The NOT aspect refers to things like A-[NOT p1]->B, where that could be a stored assertion.

2 replies

dkoslicki Jan 26, 2022

I think there is a big difference between NOT and EXCLUDE and I think they are being conflated here perhaps?

In your example above, "eg. A-[p1]->B excluding A-[p2]->B", this is the EXCLUDE aspect.

The NOT aspect refers to things like A-[NOT p1]->B, where that could be a stored assertion.

Ah yes: I was using NOT as "not connected to", or "does not have" which would be EXCLUDE.

cbizon Jan 26, 2022
Maintainer Author

OK, agreed. I meant EXCLUDE in my comments above

edeutsch · 2022-01-26T17:25:58Z

edeutsch
Jan 26, 2022
Collaborator

This is how I am thinking the two are different:

Which small molecules are related to genes that are related to Disease X, excluding the small molecules that are contraindicated for Disease X?
(A: Disease X) -> [related_to] -> [B: Gene] -> [related_to] -> [C:SmallMolecule] -> EXCLUDE-[contraindicated_for] -> (A: Disease X)

Which genes are related to Disease X, excluding the genes that are related to Disease Y?
(A: Disease X) -> [related_to] -> [B: Gene] -> EXCLUDE-[related_to] -> [D: Disease Y]

Which environmental exposures do not cause Type 1 diabetes?
(E: EnvironmentalExposure) -> [NOT causes] -> [F: Disease T1D]
(i.e. a study has probed the possibility that E causes F and asserts that it is NOT a factor)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When does query complexity live? #68

{{title}}

Replies: 11 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

When does query complexity live? #68

cbizon Jan 19, 2022 Maintainer

Replies: 11 comments · 6 replies

edeutsch Jan 19, 2022 Collaborator

vdancik Jan 19, 2022 Collaborator

webyrd Jan 25, 2022 Collaborator

cbizon Jan 26, 2022 Maintainer Author

cbizon Jan 26, 2022 Maintainer Author

edeutsch Jan 26, 2022 Collaborator

cbizon Jan 26, 2022 Maintainer Author

edeutsch Jan 26, 2022 Collaborator

cbizon
Jan 19, 2022
Maintainer

Replies: 11 comments 6 replies

edeutsch
Jan 19, 2022
Collaborator

vdancik
Jan 19, 2022
Collaborator

webyrd
Jan 25, 2022
Collaborator

cbizon
Jan 26, 2022
Maintainer Author

cbizon Jan 26, 2022
Maintainer Author

edeutsch
Jan 26, 2022
Collaborator

cbizon Jan 26, 2022
Maintainer Author

edeutsch
Jan 26, 2022
Collaborator