-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update OMIM gene references #8624
Conversation
src/ontology/mondo-edit.obo
Outdated
@@ -143773,8 +143773,8 @@ intersection_of: MONDO:0015281 ! atrial standstill | |||
intersection_of: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593 ! SCN5A | |||
intersection_of: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/4279 ! GJA5 | |||
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593 {source="OMIM:108770"} ! SCN5A | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/4279 {source="MONDO:mim2gene_medgen", source="OMIM:108770"} ! GJA5 | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593 ! SCN5A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we decided not to preserve gene references if there is no provenance at all? This goes for various examples in this PR..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gene association is for MONDO:0007171 'atrial standstill 1'.
OMIM:108770 has "digenic" in the disease description. @joeflack4 I thought those were being filtered out and added into the "review.tsv" file in the OMIM repo, is that not the case in general?
I am also not finding this association in the data file from the 2025-01-19 OMIM ingest release https://github.com/monarch-initiative/omim/releases/tag/2025-01-19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did note that any updates where there is >1 subclassof gene association will need to be reviewed manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it seems like both the gene associations added as a logical definition and a subclassOf relation should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joeflack4 after thinking a bit more about this, can you remind me:
(1) if any disease description that includes "digenic" is added into the review.tsv file
(2) if it is also added into the ROBOT file in order to create the gene association
(3) how confirmed disease defining "digenic" like https://omim.org/entry/601067 are handled
If this is in the docs, feel free to point me to that first and I can then add any follow-up questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can start looking into this in a couple hours. For right now, just wanted to let you know that the documentation for the review.tsv is on the omim repo readme towards the bottom.
If something is not in the January 19th data files, and the Mondo ingest build is newer than that, and that is problematic.
For digenic, we did for a short time have a special rule where we were adding associations when digenic was in the label, even if there was more than one association. So it was an exception rule. But we removed that exception. However, we do not have any explicit filtering. So if something is, perhaps erroneously, labeled as digenic, but only has one association, and meets the other logical conditions for a disease defining Gene, then the association will be created. That's my guess as to why this one is appearing here but I will look into it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE: OMIM:108770
So, we had to exclude this case specifically. Because, even though it is marked "digenic", it meets all of the conditions for a disease-defining gene (including 1 entry in morbidmap.txt
, which is unusual for one marked 'digenic').
Thus, this is why it is not appearing in the data release, as Trish mentioned.
And, if you look at the diff highlighted in this thread, you can see that the OMIM source evidence indeed is removed. This is correct. However, relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593
still remains, even without any source evidence. So this does not appear to be a bug with omim
or mondo-ingest
, but with the mondo
pipeline. Perhaps it needs to be adjusted such that if there is no evidence for the gene association, it is removed?
I didn't work on this pipeline previously but if you want I can tweak the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) if any disease description that includes "digenic" is added into the review.tsv file
Answer: Yes
Just to clarify, by "description", what you mean is the Phenotype label in the "Phenotype-Gene" table on the OMIM entry page, or the title in the "Phenotype" column of morbidmap.txt
.
So, what's added to the review.tsv
for 'digenic' is described here.
Basically, if it's marked digenic, but has >1 association, we filter it out. It won't appear in the review.tsv
. If is marked digenic but has only 1 association and meets the other requirements for disease-defining (no [
, {
, or ?
, not in our explicit exclusions, and mapping key = 3), then we do add it as a disease-defining association AND we add an entry for it in review.tsv
.
(2) if it is also added into the ROBOT file in order to create the gene association
Answer: Yes (described more above)
(3) how confirmed disease defining "digenic" like https://omim.org/entry/601067 are handled
There is no logic for "confirmed" cases of disease-defining associations (digenic or otherwise). We only have exclusions. How the exclusions work is that even if something otherwise meets the conditions for disease-defining, it will be excluded if it appears in that TSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MONDO:0007171 is in the exclusion file a gene association was not intended to be added. It has now been manually removed.
src/ontology/mondo-edit.obo
Outdated
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/29090 {source="MONDO:mim2gene_medgen", source="OMIM:158901"} ! SMCHD1 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 {source="OMIM:158901"} ! DUX4 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/29090 {source="MONDO:mim2gene_medgen"} ! SMCHD1 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 ! DUX4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These gene associations are for MONDO:0008031 'facioscapulohumeral muscular dystrophy 2'.
OMIM:158901 also has "digenic" in the disease description.
It seems these gene associations should also be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my tentative explanation above #8624 (comment)
If it is the case that these are, perhaps erroneously labeled as digenic, but otherwise meet all of the logical conditions for which we would normally add a disease defining association, do you want me to add an explicit filter to remove anything that happens to have digenic in the label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MONDO:0008031 is in the exclusion file a gene association was not intended to be added. It has now been manually removed.
@@ -270747,7 +270751,7 @@ xref: UMLS:C3693482 {source="MEDGEN:811326", source="MONDO:equivalentTo", source | |||
is_a: MONDO:0000653 {source="MONDO:Redundant", source="MONDO:indirect"} ! integumentary system cancer | |||
is_a: MONDO:0005164 {source="DOID:3507"} ! fibrosarcoma | |||
relationship: excluded_subClassOf MONDO:0019300 {source="Orphanet:31112", source="https://orcid.org/0000-0001-5208-3432"} ! obsolete rare skin tumor or hamartoma | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/8800 {source="MONDO:mim2gene_medgen", source="OMIM:607907"} ! PDGFB | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/8800 {source="MONDO:mim2gene_medgen"} ! PDGFB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for MONDO:0011934 'dermatofibrosarcoma protuberans'. Based on OMIM:607907 and that the mapping key is 4, this gene association should also be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evidence was removed, but the mondo
pipeline likely needs to be updated now (see: thoughts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the OMIM evidence was removed since this has a mapping key of 4 and does not meet the guidelines for disease defining, but due to the gene association also having source="MONDO:mim2gene_medgen"
and that the OMIM pipeline would not have removed the association even if there were no source provenance this gene association remains currently as a result of the pipeline.
This gene association was manually removed.
src/ontology/mondo-edit.obo
Outdated
@@ -326240,7 +326244,7 @@ xref: Orphanet:397959 {source="MONDO:equivalentTo", source="OMIM:615387"} | |||
xref: UMLS:C3809332 {source="MONDO:equivalentTo", source="MONDO:MEDGEN", source="MEDGEN:815662"} | |||
is_a: MONDO:0018814 {source="Orphanet:397959", source="https://orcid.org/0000-0001-5208-3432"} ! non-SCID combined immunodeficiency | |||
relationship: curated_content_resource https://search.clinicalgenome.org/kb/conditions/MONDO:0014160 {source="MONDO:CLINGEN"} | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12029 {source="MONDO:mim2gene_medgen", source="OMIM:615387"} ! TRAC | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12029 {source="MONDO:mim2gene_medgen"} ! TRAC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://omim.org/entry/614102, I don't see why the OMIM gene was not updated to IGKC. @joeflack4 do you see why the gene association was not changed to IGKC with the source OMIM:614102?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, firstly, the gene-disease association evidence from OMIM was removed in this case.
If you were asking why the label TRAC wasn't changed to IGKC
I don't think this is what you were asking, but:
When you say "changed to IGKC", do you mean that you would expect to see an association to the HGNC class which has the label "IGKC" should be added (to this Mondo term, or another one)?
Or that you should see the label ! TRAC
changed to ! IGKC
in mondo-edit.obo
on this highlighted line has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12029
? If the latter, then I think that this is not a function of the mondo
omim-genes pipeline, which uses omim-gene-equivalence.ru
. It doesn't look like the pipeline has any functionality to update the labels for these associations.
Note that the evidence you have highlighted here was for OMIM:615387, not OMIM:614102.
MONDO:0013576
has relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/5716 {source="MONDO:mim2gene_medgen"} ! IGKC
. MONDO:0014160
has relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12029 {source="MONDO:mim2gene_medgen"} ! TRAC
It took me a bit of time to see why there is no OMIM source evidence being added for IGKC.
The reason is that mim2gene.txt
doesn't show a relationship to HGNC:5716 (or any HGNC symbol or ID) for the gene (OMIM:147200) mapped to OMIM:614102. It only has an NCBI gene entry:
MIM Number | MIM Entry Type (see FAQ 1.3 at https://omim.org/help/faq) | Entrez Gene ID (NCBI) | Approved Gene Symbol (HGNC) | Ensembl Gene ID (Ensembl) |
---|---|---|---|---|
147200 | gene | 3514 | ENSG00000211592 |
For reference, the morbidmap.txt
mapping:
Kappa light chain deficiency, 614102 (3) IGKC, IGKCD 147200 2p11.2
Note that you can see "IGCK" in that row, but it us under the "Gene/Locus And Other Related Symbols" column, which we do not parse.
This does result in the following entry added to omim.owl
: AnnotationAssertion(<http://www.w3.org/2004/02/skos/core#exactMatch> <https://omim.org/entry/147200> <https://www.ncbi.nlm.nih.gov/gene/3514>)
However, we don't have any mappings between NCBI genes and HGNC IDs in OMIM. So this doesn't get captured by the omim-genes pipeline. See mondo-omim-genes.sparql
:
SELECT DISTINCT ?mondo_id ?hgnc_id ?omim_disease_xref ?omim_gene
WHERE
{
?omim_disease a owl:Class .
?omim_disease skos:exactMatch ?mondo_id .
?omim_disease rdfs:subClassOf [
owl:onProperty RO:0004003 ;
owl:someValuesFrom ?omim_gene
] .
?omim_gene skos:exactMatch ?hgnc_id .
FILTER(STRSTARTS(STR(?hgnc_id), "http://identifiers.org/hgnc/"))
All of the conditions are met here, except that ?hgnc_id
doesn't start with "http://identifiers.org/hgnc/". So it does not get added to mondo-omim-genes.robot.tsv
. And therefore it doesn't get added to mondo-edit.obo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: The comment about IGKC refers to this line for MONDO:0013576 and there is a HGNC identifier for this gene: https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:5716
NOTE: The gene association on this line should be maintained and with the OMIM source evidence. The OMIM source provenance was removed because of the missing data mapping in the mim2gene.txt
file. Joe will email OMIM to let them know they are missing the HGNC gene identifier in the mim2gene.txt
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: The gene association for MONDO:0014160 on this line should have also been maintained. The OMIM source provenance was not added back here because of missing data for the HGNC identifier in the mim2gene.txt
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue w/ the data that Trish and I are discussing is that we see a label for the HGNC term in morbidmap.txt
, but the ID for that term does not appear in mim2gene.txt
. We will contact OMIM about this apparent discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also discussed a few programmatic solutions to this issue. Given the status of the project, I would like to get feedback on how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OMIM source provenance is now added back for these gene associations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these instances, I also sent an email to OMIM:
We're having an issue where we are processing mim2gene.txt and it appears there are missing HGNC symbols in the Approved Gene Symbol (HGNC) column. We are expecting to see them there because the gene has an entry in morbidmap.txt where the symbol appears.
So we're just thinking that this is a consistency problem. We are wondering if you agree; that if symbols appear in one of the files for an association, they should also appear in the other file.
I created a GitHub issue for Monarch purposes, but wanted to share it with you because it goes over an example in detail: OMIM:147200 <-> HGNC:5716 (IGKC).
There are other examples we found which also fit the same discrepancy / pattern:
- OMIM:615387 <-> HGNC:12029 (TRAC)
- OMIM:601495 <-> HGNC:5541 (IGHM)
And their response:
TRAC, IGKC, and IGHM are immunoglobin gene/regions. As such,
they are unusual and do not carry a standard a "gene" locus type
annotation in the browsers. We do not always have a 1-to-1 correlation
with HGNC for the immunoglobin entities. I will review these cases
and see if we can match a few more.Of note, MIM:615387 and MIM:601495 shouldn't map to any HGNC ID.
They are phenotypes, not genes.
src/ontology/mondo-edit.obo
Outdated
@@ -465430,7 +465434,7 @@ is_a: MONDO:0015977 {source="MESH:C538056", source="MONDO:Redundant", source="OM | |||
intersection_of: MONDO:0011096 ! autosomal agammaglobulinemia | |||
intersection_of: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/5541 ! IGHM | |||
relationship: curated_content_resource https://search.clinicalgenome.org/kb/conditions/MONDO:0020729 {source="MONDO:CLINGEN"} | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/5541 {source="MONDO:mim2gene_medgen", source="OMIM:601495"} ! IGHM | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/5541 {source="MONDO:mim2gene_medgen"} ! IGHM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this OMIM source should have been maintained. @joeflack4 can you look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the phenotype OMIM:601495 is mapped to the gene OMIM:147020 as shown in morbidmap.txt
:
Agammaglobulinemia 1, 601495 (3) IGHM, MU, AGM1 147020 14q32.33
But as with my long explanation above, the reason we don't have any HGNC source evidence being added is because it only shows an NCBI gene entry in mim2gene.txt
, not an HGNC one:
mim2gene.txt
:
147020 gene 3507 ENSG00000211899
Which gets added like this in omim.owl
:
`AnnotationAssertion(http://www.w3.org/2004/02/skos/core#exactMatch https://omim.org/entry/147020 https://www.ncbi.nlm.nih.gov/gene/3507)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the issue with the data, the OMIM source provenance for MONDO:0020729 on this line should have also been kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue w/ the data that Trish and I are discussing is that we see a label for the HGNC term in morbidmap.txt
, but the ID for that term does not appear in mim2gene.txt
. We will contact OMIM about this apparent discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OMIM source provenance was added back to this gene association.
src/ontology/mondo-edit.obo
Outdated
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/2979 {source="OMIM:619478"} ! DNMT3B | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 {source="OMIM:619478"} ! DUX4 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/2979 ! DNMT3B | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 ! DUX4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disease contains "digenic" in the description and the gene associations should be removed (not just the source provenance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per explanation here, just because it has digenic in the label doesn't necessarily mean it will be removed. Though in most cases it will. However it'll be removed because there are >1 associations, not because 'digenic' is in the label.
Regarding removing the entire association, this is part of the mondo
pipeline that Nico created. I assume you or I will edit this to remove the association when no evidence remains? I can do that if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MONDO:0030355 is in the exclusion file so these gene associations should not have existed in mondo-edit.obo. These gene associations should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gene association was removed.
src/ontology/mondo-edit.obo
Outdated
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/253 {source="OMIM:619151"} ! ADH5 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/404 {source="OMIM:619151"} ! ALDH2 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/253 ! ADH5 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/404 ! ALDH2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disease description contains "digenic" for OMIM:619151 and these gene associations should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (MONDO:0030894) is in the exclusion file and the gene association should be removed.
src/ontology/mondo-edit.obo
Outdated
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12441 {source="OMIM:620040"} ! TYMS | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/30365 {source="OMIM:620040"} ! ENOSF1 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12441 ! TYMS | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/30365 ! ENOSF1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disease description contains "digenic" for https://omim.org/entry/620040 and these gene associations should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (MONDO:0031057) is in the exclusion file and the gene association should be removed.
This PR ready for re-review. Additional tickets based on the issues found here will be added. |
src/ontology/mondo-edit.obo
Outdated
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593 {source="OMIM:108770"} ! SCN5A | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/4279 {source="MONDO:mim2gene_medgen", source="OMIM:108770"} ! GJA5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn since these gene associations do not exist in master
, do you know how the OMIM pipeline added these in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If tehy are not in master, it means this branch is behind master and needs to be rebased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabrinatoro can you remind me what your expectation is for entries that you said should be in the exclusion file?? I thought these were Mondo classes where OMIM source provenance should not be added and therefore the gene association(s) should not be on the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn you mentioned in the first review "I thought we decided not to preserve gene references if there is no provenance at all? This goes for various examples in this PR."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabrinatoro the exclusion file is from the spreadsheet that you created and mentioned in the OMIM gene update in Nov.
That PR does not have any gene associations for MONDO:0015281 'atrial standstill'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR does not have any gene associations for MONDO:0015281 'atrial standstill'.
@twhetzel I don't follow...
This PR show that the gene annotation (2 genes) are removed from atrial standstill 1 - MONDO:0007171
This should not happen.
Based on the notes from November "I also started this spreadsheet to keep track of the record that should be excluded from the pipeline because they do not fit the rules."
To me, it means that nothing should happen to this term, ie no gene removed, no gene added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back to look at all the commits from the previous PR and the last two you did and added the exclude from qc checks were not originally showing up and making this a very confusing picture. Now that I am aware of those later commits, I hope that clear things up 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will manually add these back and @joeflack4 will update the OMIM ingest code to "protect" these so that nothing happens to this term, ie no gene removed, no gene added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
src/ontology/mondo-edit.obo
Outdated
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/10593 {source="OMIM:108770"} ! SCN5A | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/4279 {source="MONDO:mim2gene_medgen", source="OMIM:108770"} ! GJA5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
src/ontology/mondo-edit.obo
Outdated
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/29090 {source="MONDO:mim2gene_medgen", source="OMIM:158901"} ! SMCHD1 | ||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 {source="OMIM:158901"} ! DUX4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
src/ontology/mondo-edit.obo
Outdated
@@ -500399,8 +500392,6 @@ xref: OMIM:619478 {source="MONDO:equivalentTo"} | |||
xref: UMLS:C5561960 {source="MEDGEN:1794170", source="MONDO:equivalentTo", source="MONDO:MEDGEN"} | |||
is_a: MONDO:0001347 {source="DOID:0060918", source="OMIM:619478"} ! facioscapulohumeral muscular dystrophy | |||
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/2979 {source="OMIM:619478"} ! DNMT3B | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/50800 {source="OMIM:619478"} ! DUX4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the presence of the http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
means necessarily, that this is a disease which is allowed to have multiple genes as causes - this will make our pipelines a little more complex..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
means a curator reviewed this and agreed that this disease has more than 1 disease defining gene associations
src/ontology/mondo-edit.obo
Outdated
@@ -503207,8 +503198,6 @@ xref: Orphanet:611216 {source="MONDO:equivalentTo"} | |||
xref: UMLS:C5436906 {source="MONDO:equivalentTo", source="MONDO:MEDGEN", source="MEDGEN:1754257"} | |||
is_a: MONDO:0000159 {source="OMIM:619151"} ! bone marrow failure syndrome | |||
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/253 {source="OMIM:619151"} ! ADH5 | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/404 {source="OMIM:619151"} ! ALDH2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
src/ontology/mondo-edit.obo
Outdated
@@ -505014,8 +505003,6 @@ xref: OMIM:620040 {source="MONDO:equivalentTo"} | |||
xref: UMLS:C5774217 {source="MEDGEN:1823990", source="MONDO:equivalentTo", source="MONDO:MEDGEN"} | |||
is_a: MONDO:0015780 {source="OMIM:620040"} ! dyskeratosis congenita | |||
relationship: excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/12441 {source="OMIM:620040"} ! TYMS | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/30365 {source="OMIM:620040"} ! ENOSF1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term has the annotation
excluded_from_qc_check http://purl.obolibrary.org/obo/mondo/sparql/qc/mondo/qc-multiple-gene-associations.sparql
These gene annotations should not be removed.
src/ontology/mondo-edit.obo
Outdated
@@ -175530,6 +175521,7 @@ is_a: MONDO:0005138 {source="DOID:5409", source="EFO:0000702", source="MONDO:Red | |||
is_a: MONDO:0005454 {source="MONDO:Redundant", source="NCIT:C4917/inferred", source="ONCOTREE:SCLC"} ! lung neuroendocrine neoplasm | |||
intersection_of: MONDO:0000402 ! small cell carcinoma | |||
intersection_of: disease_has_location UBERON:0002048 ! lung | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/9884 {source="OMIM:182280"} ! RB1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to omim, this term has a % in front of the name meaning that "Phenotype description or locus, molecular basis unknown"
Please confirm that this annotation should be added according to our rules and because it is in the file we used.
If it is, then the workflow works correctly, and I will remove it manually (as it is an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the logic we set up, we allow addition of these associations regardless of the MIM type (#, %, etc).
If you want, it is simple for me to add logic such that we only allow additions of associations if the disease MIM is of type # (Phenotype).
I think this would be easy and better than handling these manually each time.
FYI regarding how unexpected types are currently handled:
We do however, "flag" cases where the MIM type is unexpected. We expect that the MIM type will be # (phenotype) or % (Phenotype description or locus, molecular basis unknown). By flag, I mean that if an association meets all of the conditions for disease-defining but isn't a # or %, an entry will get added to review.tsv
in the release (documentation). Note that there have been no instances of a non # or % being added, though.
src/ontology/mondo-edit.obo
Outdated
@@ -165662,6 +165652,7 @@ xref: SCTID:721307000 {source="MONDO:equivalentTo"} | |||
xref: UMLS:C1834582 {source="MEDGEN:331782", source="MONDO:equivalentTo", source="MONDO:MEDGEN"} | |||
is_a: MONDO:0020076 {source="DOID:0060888", source="Orphanet:420611"} ! myeloproliferative neoplasm | |||
relationship: disease_arises_from_feature MONDO:0008608 ! Down syndrome | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/4170 {source="OMIM:159595"} ! GATA1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as below.
If this annotation is expected as per our rules and the file we use, then all is well.
It should however be removed manually as it is not correct according to the website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see this is for MONDO:0008040 'transient myeloproliferative syndrome' and that the OMIM page shows this with a mapping key of 2. @joeflack4 mentioned this has 2 rows in the morbidmap.txt
file and is found to be self-referential and somatic so for these reasons it is added into the review.txt
file and needed curator review to decide whether the gene association should be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good! I am approving this PR.
@twhetzel, feel free to merge, but first, please see my comment regarding the SOP for annotations that should not be updated. Thank you.
@@ -225590,6 +225591,7 @@ xref: OMIM:278850 {source="MONDO:equivalentTo"} | |||
xref: Orphanet:393 {source="OMIM:278850"} | |||
xref: UMLS:C2749215 {source="MONDO:equivalentTo", source="MONDO:MEDGEN", source="MEDGEN:411414"} | |||
is_a: MONDO:0100249 {source="DOID:0111763", source="Orphanet:393/btnt"} ! 46,XX testicular disorder of sex development | |||
relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/11204 {source="OMIM:278850"} ! SOX9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have this relationship to the gene because
"(...) is caused by heterozygous duplication or triplication of a 68-kb regulatory region (XXSR) -584 to -516 kb upstream of the SOX9 gene on chromosome 17q24."
However, one would not be able to see without reading the details. (ie this has nothing to do with this workflow).
I will manually remove it in a follow-up PR (I have something else to update).
@twhetzel Could you please confirm: Should I add this Mondo/OMIM to the "excluded spreadsheet", so it will not be updated next time? What is the SOP? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, there will be another file that can be updated by curators for gene associations that should not be added even though they otherwise meet the rules. Joe is working on this now.
This update is based on [this PR](monarch-initiative/mondo#8624) NOTE: SOP is needed surrounding this exclusion document. (In addition, we need to create a standard for the exclusion reason).
Update OMIM gene references.