Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclass diseases do not make sense #142

Closed
karafecho opened this issue Apr 7, 2023 · 8 comments
Closed

Subclass diseases do not make sense #142

karafecho opened this issue Apr 7, 2023 · 8 comments

Comments

@karafecho
Copy link

The following query is returning results that include diseases other than cardiovascular disorders: polybrominated diphenyl ether - related_to - Gene - related_to - cardiovascular disorder. For example, the top answer is for complete gonadal dysgenesis.

image

@karafecho
Copy link
Author

Also see #139.

@cbizon
Copy link
Contributor

cbizon commented Apr 11, 2023

Complete Gonadal Dysgenesis is a subclass of cardiovascular disease in MONDO:
image

@cbizon
Copy link
Contributor

cbizon commented Apr 11, 2023

So I'm not sure that this is a problem?

@karafecho
Copy link
Author

karafecho commented Apr 11, 2023

Interesting. I had looked up complete gonadal dysgenesis before posting this ticket and did not see an obvious relationship to cardiovascular disorder. I just checked Orphanet's hierarchy and it, too, lists complete gonadal dysgenesis as a rare cardiac disease: https://www.orpha.net/consor/cgi-bin/Disease_Classif.php?lng=EN&data_id=146&PatId=1044&search=Disease_Classif_Simple&new=1. So, I guess this result is accurate.

There are three publications supporting the decabromodiphenyl oxide - NR5A1 edge. There isn't any publication support for the NR5A1 - complete gonadal dysgenesis edge, but Pharos is making the assertion and OmniCorp is supporting it with 45 co-occurrences, so I suppose the edge is sufficiently supported.

I'd be curious to: (1) compare these results (automat.renci.org/#/robokopkg) with those from ExEmPLAR (robokopkg.renci.org) and (2) get a reaction from the expert who submitted the question, as she was interested in cardiotoxicity (which I was unable to include as a node - I tried a bunch of entities/CURIES), but I'm guessing this answer, while valid, wouldn't align with her expectations/interests. Then again, I'm not sure it's worth sharing these results, as one might argue that the query didn't accurately capture the question.

In some sense, this issue relates to the 'subclass' issues noted in #139 and elsewhere.

@karafecho
Copy link
Author

ExEmPLAR results (for comparison):

image

image

@karafecho
Copy link
Author

ROBOKOP results for decabromodiphenyl oxide only:

image

@cbizon
Copy link
Contributor

cbizon commented Apr 12, 2023

Exemplar (IIRC) is doing a text based search for nodes. It is not reasoning over subclass of edges, nor is it using identifiers. So there will be differences based on those things.

Is the issue here that we just don't like this subclass result? To some degree we're at the mercy of what the data says. The only other way I can think to affect this is by downweighting edges that are supported by subclass of inferences.

@karafecho
Copy link
Author

Yes, I completely expect differences between ROBOKOP and ExEmPLAR (they aren't even querying the same KG); I was just comparing the two result sets, as noted.

I think we need to provide users with control over subclass inferences, especially when high-level nodes such as "cardiovascular disorder" are selected. I will close this ticket and comment on #139.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants