Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent leaf-specific IBA propagation if NOT qualifiers do not match #39

Open
dustine32 opened this issue Nov 5, 2019 · 5 comments
Open
Assignees

Comments

@dustine32
Copy link
Collaborator

As explained in #30 (comment) I'll need to implement a check in the IBA generation script that blocks an IBA annotation to a leaf if that specific leaf has an experimental annotation with conflicting qualifier. Right now, I'll only check "NOT" vs "no qualifier" conflicts. I believe matching other qualifiers like "contributes_to" is still in discussion.

An example case is shown here:
image
The IBD on PTN000185192 is still valid and can be used to propagate to its other descendant leaf sequences, but the experimental NOT IGI annotation on PomBase:SPAC1B3.15c should block IBA propagation to this leaf.

Related tickets:
geneontology/paint#54
geneontology/go-annotation#2378

@dustine32
Copy link
Collaborator Author

@pgaudet I've implemented the IBA block for PAINT vs. exp NOT qualifier conflicts but have not yet pushed any new IBA files. I did a test run and generated a before/after report tracking IBA count differences.

Would you be able to spot-check this report for any unintended effects? What works for me is plugging the PTHR family and GO term into amigo and then looking for the NOT. Otherwise, I'm working on getting the actual list of to-be-dropped IBA lines (there are 269).

@dustine32
Copy link
Collaborator Author

(Taking notes for myself)

For testing, I generated two sets of IBA GAFs (before and after code change) and ran these commands to get all dropped IBAs:

$ cat 2019-11-20_fullgo_test/IBA_GAFs/* > 2019-11-20_fullgo_test/all_IBAs
$ cat 2019-11-20_fullgo_test/preupdate_data/IBA_GAFs/* > 2019-11-20_fullgo_test/preupdate_data/all_IBAs
$ diff -u 2019-11-20_fullgo_test/preupdate_data/all_IBAs 2019-11-20_fullgo_test/all_IBAs | grep -E "^\-" > 2019-11-20_fullgo_test/dropped_IBAs_raw
$ grep -v "Created on" 2019-11-20_fullgo_test/dropped_IBAs_raw | grep -v "2019-11-20_fullgo_test" | sed 's/^-//' > 2019-11-20_fullgo_test/dropped_IBAs
$ wc -l 2019-11-20_fullgo_test/dropped_IBAs
324

Meaning 324 IBAs were dropped due to this code change. However, this number doesn't line up with the report, which says 269 lines were dropped. Spot-checking some of the lines having IBD PTNs not in the report (e.g. PTN001998491) I notice that these lines are in both before and after IBA files having no difference as far as I can tell (tried several diff options and looking for hidden characters). Guessing diff is playing tricks on me or something.

I can xref the report's IBD nodes to filter out lines that shouldn't be there.

@pgaudet
Copy link
Collaborator

pgaudet commented Nov 28, 2019

Hi @dustine32

Do you mean that this script gets rid of the inferred NOT IBA here (from PTHR13271)?

image

I also checked PTHR10024 - it also seems OK.

Probably the way to be sure is if you exported the GAF for each of the impacted families - is that 'easy' ?
Thanks, Pascale

@dustine32
Copy link
Collaborator Author

@pgaudet Yep, that inferred NOT IBA should be removed by the code change due to its conflict with that positive IDA.

That's a great idea about just getting the GAFs for the impacted families. That might also clear up the weirdness I'm seeing trying to get an accurate diff of dropped lines.

@dustine32
Copy link
Collaborator Author

@pgaudet Finally, I've got an accurate list of dropped IBAs for you to look at, though I used a mixed application of your idea to only output impacted families with my previous diff-ing and grep-ping attempts.

Basically, outputting all IBA GAFs for the IBD PTNs in the before/after report and then applying the diff/grep commands above gets me to the expected 269 count. This GAF file is uploaded to the google drive for your downloading convenience.

For your PTHR13271 peptidyl-lysine trimethylation (GO:0018023) example. Only one IBA was shown as dropped:

UniProtKB       Q86TU7  SETD3           GO:0018023      PMID:21873635   IBA     PANTHER:PTN000998435|ZFIN:ZDB-GENE-030131-9137  P       Histone-lysine N-methyltransferase setd3        UniProtKB:Q86TU7|PTN002491248   protein taxon:9606      20170228        GO_Central

But this one is positive (no NOT qualifier). I actually answered your question earlier without knowing the gene that the IBA in question was for, so... is this (UniProtKB:Q86TU7) your card (gene)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants