Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle qualifiers for new GAF 2.2 spec #45

Open
dustine32 opened this issue Apr 8, 2020 · 12 comments
Open

Handle qualifiers for new GAF 2.2 spec #45

dustine32 opened this issue Apr 8, 2020 · 12 comments
Assignees

Comments

@dustine32
Copy link
Collaborator

From @thomaspd's email:

Hi all,

A change in the GAF format is coming, probably in July. The only change will be the qualifier column. For BP annotations, there will be additional qualifiers that describe the relationship between the gene product and the BP term. We will need to change our parsers to parse this out, and make it part of the GO load each month.

So all the proposed changes in this email are not urgent, but should be done by July.

We will also want to make use of this information in the enrichment tool. We will want to distinguish between two different types of BP: ones where the gene product has the “part_of” qualifier, versus ones that do not. Some organisms will distinguish between these, while others will not. So we’ll have to keep track of which organisms have BP annotations of these two different types. If they don’t distinguish them, there will be no change for the enrichment tool. If they do distinguish them, there will be an additional choice in the dropdown menu in addition to GO BP complete, something like “GO BP directly involved”, which will be the set that has only BP annotations that have the part_of qualifier. GO BP complete will continue to have all BP annotations.

Thanks,

Paul.

More info here: geneontology/go-annotation#2917. This has ramifications for both PANTHER and PAINT loads.

PANTHER
It looks like the genelist_agg table may need to be adjusted to retain qualifiers with the GO terms associated with a gene. Qualifiers are already somewhat factored into loading the genelist_agg table as NOT annotations are excluded.

PAINT
The qualifier column is already parsed and loaded into the Curation DB for GO annotaions. But with more GO annotations having qualifiers that likely won't (initially) match the PAINT annotation qualifiers, mismatches will cause many PAINT annotations to be obsoleted during a full GO update. Maybe we should have some sort of rule-based update followed by manual review for the initial load of the newly formatted GAFs?

It would be nice to get some sample, preview data. Like, an exp GO annotation currently used as evidence in PAINT that also has qualifiers in whatever curation tool it's source from (e.g. Protein2GO).

The GAF creation script likely won't need much modification but I believe there are some regexes used that specifically look for CONTRIBUTES_TO and COLOCALIZES_WITH.

Also tagging @huaiyumi and @mugitty

@dustine32 dustine32 self-assigned this Apr 8, 2020
@dustine32
Copy link
Collaborator Author

@pgaudet @thomaspd @huaiyumi For exporting the IBAs to GAF 2.2, what should the default qualifiers be by aspect?

  • GP -> BP? Is it involved_in or part_of
  • MF - enables
  • CC - part_of

Also, if an IBA has a NOT qualifier, it will be included as usual with the "relation" qualifier determined above (e.g. enables, part_of). If the IBA also has a contributes_to or colocalizes_with qualifier, should these be appended to the IBA's qualifier list or replace the "relation" qualifier?

@thomaspd
Copy link

I just looked at: http://wiki.geneontology.org/index.php/Involved_in, and the gp -> bp for PAINT should be involved_in

@dustine32
Copy link
Collaborator Author

@thomaspd Sweet! Thanks for straightening this out.

And for the existing qualifiers in PAINT (contributes_to, colocalizes_with) I'm now assuming these should replace the default qualifier if exists, since, for example, an annotation with both contributes_to and involved_in seems either redundant and/or confusing, right?

@pgaudet
Copy link
Collaborator

pgaudet commented Oct 23, 2020

If there already is a qualifier, you keep that qualifier.

@dustine32
Copy link
Collaborator Author

@thomaspd @pgaudet Doh! I just noticed we may be using the wrong default qualifier for CC after rereading this - geneontology/go-annotation#2917 (comment). Should the CC default be located_in instead of part_of?

dustine32 added a commit that referenced this issue Nov 2, 2020
@pgaudet
Copy link
Collaborator

pgaudet commented Nov 3, 2020

No, I dont think PAINT should be using default qualifiers. Default qualifiers are used when we are not sure the protein is active in the specified location (or plays part in a process). In PAINT our annotation guidelines are to only propagate CC that are consistent with the role of the protein.

Thanks, Pascale

@pgaudet
Copy link
Collaborator

pgaudet commented Nov 3, 2020

Actually, looks like 'is_active_in' in allowed, this is the best one for PAINT.

@dustine32
Copy link
Collaborator Author

@pgaudet Thanks! I'll use is_active_in for CC then. I should note these default qualifiers only come into play when exporting the IBAs to GAF 2.2, since the qualifier column now requires a value. These default qualifiers won't be stored in PAINT and you won't see them in the tool.

@dustine32
Copy link
Collaborator Author

@pgaudet @thomaspd Looks like we also have IBDs to protein-containing complex descendants. For complex terms, the default qualifier should be part_of, right?

@pgaudet
Copy link
Collaborator

pgaudet commented Nov 4, 2020

Yes !

@dustine32
Copy link
Collaborator Author

Commit 732ffe8 prevents new gp2term relations from GAF 2.2 from getting into PAINT and PANTHER.

Noting that this is a temporary, short-term (and easily revertible) solution to get the GAF 2.2-sourced annotations into PAINT/PANTHER without mucking up the existing load process. We'll discuss/document the actual policy to implement on a PAINT call.

@pgaudet
Copy link
Collaborator

pgaudet commented May 11, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants