Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map NCIT biological process terms to GO #68

Open
nicolevasilevsky opened this issue Apr 14, 2021 · 7 comments
Open

map NCIT biological process terms to GO #68

nicolevasilevsky opened this issue Apr 14, 2021 · 7 comments

Comments

@nicolevasilevsky
Copy link
Contributor

@gaurav create an output available mappings from UMLS to GO

Background:
I started working on mapping NCIT processes to GO terms in this spreadsheet.

I started by looking at disease terms that were in the 'DiseaseMapping' tab, and looking at which NCIT terms in the logical definitions could be mapped to GO terms.

I reviewed ~50 terms so far and have noticed that there are not many terms in the NCIT equiv axioms that can be mapped to GO terms.

NCIT uses the relation 'Disease_Has_Finding', which seems to be related to all sorts of terms, including 'morphologic findings'. Some of these map to GO terms, but so far, the majority do not.

@fragosog
Copy link

Hi @nicolevasilevsky , this was actually done by design because way back we were thinking of one day reusing GO biological processes. So NCIt processes only covered the minimum wild type processes that it needed for specific DL modeling, and payed more attention to pathologic processes (this has changed a bit more recently). Our last foray into the reuse question was an analysis of the various things that we needed to do in order to take this to production (e.g. how to deal with deprecation, mapping, remapping on deprecation, so on). But we were stretched thin and it's now on the back burner. Don't know whether what we did would help you but would be happy to go over it with you if you wish.

@nicolevasilevsky
Copy link
Contributor Author

@gaurav @balhoff 👀

@nicolevasilevsky
Copy link
Contributor Author

Thanks @fragosog! Let me check with Jim and Gaurav. :)

@gaurav
Copy link

gaurav commented Apr 19, 2021

Hi everybody! Sorry, it took me a while to get my UMLS mapping tool running again, but I've loaded up the UMLS 2020AB (Fall) release and it should be good to go now.

Before proceeding any further, let me verify that I understand what we're trying to do:

  1. Look up the ~80 diseases in the DiseaseMapping tab in the NCIT OBO Edition (e.g. Acinar Cell Neoplasm, obo:NCIT_C4197).
    • It looks like this is already done!
  2. Look up the logical expression for this term (e.g. for Acinar Cell Neoplasm, this is Glandular Cell Neoplasm and (Disease_Has_Normal_Tissue_Origin some Glandular Epithelium) and (Disease_Has_Normal_Cell_Origin some Acinar Cell) and (Disease_Has_Abnormal_Cell some Neoplastic Acinar Cell) and (Disease_Has_Finding some Acinar Cell Differentiation)). We're only interested in certain properties here that point to disease findings, such as Disease_Has_Finding (obo:NCIT_R108), whose value must be some Acinar Cell Differentiation (obo:NCIT_C54714).
    • Is there an easier way to do this than reading the OWL logical expressions?
  3. For all of those disease findings, we want to map the NCIT term (i.e. obo:NCIT_C54714) to the GO term (in this case, acinar cell differentiation (GO:0090425)) via the UMLS, which has this mapping.
    • My UMLS mapping tool should be able to do this! If I understand Gilberto correctly, it's possible that these links would be present for all these terms -- in that case, would it be helpful to try searching for them by name or looking them up in BioPortal or something to try to find a GO term for them?

Does that sounds right? If so, I think we should start by trying to complete step 2 first, so we get a list of all the NCIt terms from the OWL logical expressions that we want to try to map to GO, which I can then use as input to my program.

@balhoff: could you please point me to the code that you used to generate the logical expressions in NCIt-OBO in the first place? I think it might be easier to extract the disease findings from that than trying to read it from the OWL logical expressions. If you want to take a stab at solving step 2 yourself, that'd be great -- I don't have any programs that do that yet, so I'd be starting from scratch, whereas you probably have more experience than me in working with NCIt-OBO!

@balhoff
Copy link
Member

balhoff commented Apr 19, 2021

@gaurav I was thinking of something a bit simpler. Just extract mappings from NCIt terms to GO terms. It doesn't matter whether they are used in axioms or not.

@gaurav
Copy link

gaurav commented Apr 19, 2021

Ah, okay! That is a lot simpler :). Do you mean something like this: https://docs.google.com/spreadsheets/d/1_ERXJJjQsHNza5MihO0EU2eegmJUOANUqbiR1nQ_ju8/edit?usp=sharing?

There are 180,003 Gene Ontology concepts in the UMLS, but apparently only 800 of those concepts appear to be mapped to NCI concepts. That seems like quite a discrepancy -- if you find any GO concepts that should be in this list but aren't, please let me know!

@balhoff
Copy link
Member

balhoff commented Apr 19, 2021

@gaurav yep that's perfect, thanks! There are only about 45,000 GO concepts. There must be something off about the 180,000 number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants