You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The source OMIM:614102 was removed because the OMIM gene-disease pipeline did not find evidence for an association between OMIM:614102 and HGNC:5716 (IGKC). However, we were surprised, because the label for it is IGKC, and we observed that IGKC is visible in the "Gene/Locus" field of the "Phenotype-Gene Relationships" table on https://omim.org/entry/614102.
You can also see it visible in morbidmap.txt, which is the data file that represents all of these "Phenotype-Gene Relationships" tables:
Phenotype
Gene/Locus And Other Related Symbols
MIM Number
Cyto Location
Kappa light chain deficiency, 614102 (3)
IGKC, IGKCD
147200
2p11.2
So why the removal? Because pipeline is not looking for HGNC symbols or IDs in morbidmap.txt. It's looking for them in mim2gene.txt. And there is a discrepancy where even if the HGNC symbol shows up for the association morbidmap.txt, it does not always appear in the same association in mim2gene.txt:
We will contact OMIM to see if there is some reason why morbidmap.txt and mim2gene.txt appear to be out of sync in this way. The best solution may involve a fix on their end.
Otherwise, we can fix this on our end in a number of ways:
a. At the end of the pipeline, run an additional SPARQL query to see if there are any associations which are missing HGNC evidence, and if so, we can add it.
b. We can do the same thing, but rather than SPARQL, do a check in Python right before adding the association.
c. Earlier in the Python pipeline, we can combine all of the HGNC associations from both morbidmap.txt and mim2gene.txt.
For any of these a-c solutions, we can utilize hgnc_complete_set.txt, which we are already downloading and using elsewhere in the pipeline. We can use the first two columns, hgnc_id (e.g. HGNC:5716), and symbol (e.g. IGKC), to map any symbols we see in morbidmap.txt to their HGNC IDs.
The text was updated successfully, but these errors were encountered:
Overview
Recently when running the disease-gene pipeline in mondo, we noticed that a lot of disease-gene source annotations were getting unexpectedly removed.
Explanation by example
The source
OMIM:614102
was removed because the OMIM gene-disease pipeline did not find evidence for an association betweenOMIM:614102
andHGNC:5716
(IGKC). However, we were surprised, because the label for it is IGKC, and we observed that IGKC is visible in the "Gene/Locus" field of the "Phenotype-Gene Relationships" table on https://omim.org/entry/614102.You can also see it visible in
morbidmap.txt
, which is the data file that represents all of these "Phenotype-Gene Relationships" tables:So why the removal? Because pipeline is not looking for HGNC symbols or IDs in
morbidmap.txt
. It's looking for them inmim2gene.txt
. And there is a discrepancy where even if the HGNC symbol shows up for the associationmorbidmap.txt
, it does not always appear in the same association inmim2gene.txt
:Possible solutions
We will contact OMIM to see if there is some reason why
morbidmap.txt
andmim2gene.txt
appear to be out of sync in this way. The best solution may involve a fix on their end.Otherwise, we can fix this on our end in a number of ways:
a. At the end of the pipeline, run an additional SPARQL query to see if there are any associations which are missing HGNC evidence, and if so, we can add it.
b. We can do the same thing, but rather than SPARQL, do a check in Python right before adding the association.
c. Earlier in the Python pipeline, we can combine all of the HGNC associations from both
morbidmap.txt
andmim2gene.txt
.For any of these a-c solutions, we can utilize
hgnc_complete_set.txt
, which we are already downloading and using elsewhere in the pipeline. We can use the first two columns,hgnc_id
(e.g. HGNC:5716), andsymbol
(e.g. IGKC), to map any symbols we see inmorbidmap.txt
to their HGNC IDs.The text was updated successfully, but these errors were encountered: