Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KOBAS evaluation #285

Open
mpoelchau opened this issue Jul 31, 2024 · 3 comments
Open

KOBAS evaluation #285

mpoelchau opened this issue Jul 31, 2024 · 3 comments
Assignees

Comments

@mpoelchau
Copy link
Contributor

It is worth exploring whether there are better alternatives to insect KEGG pathway annotation than KOBAS since when we developed the pipeline ~5 years ago.

General considerations:

  • have other databases/entities started annotating Refseq proteins with KEGG pathways? (e.g. KEGG itself; uniprot)
    • eggnog is often wrong wrt finding orthologs
  • what software programs exist that we could use, instead? Create list with pros and cons.
  • should we explore using flybase pathway annotations? (are they free; machine readable?)? Wormbase? Alliance?

Software considerations:

  • Is the software maintained?
  • Is it already used/adopted by the bioinformatics community?
  • Is the pathway annotation transferred directly from the entity that was curated to the new protein, or are there intermediate steps?

Other considerations:

  • InterProScan is not adding Reactome data anymore via AgBase; script to add it is broken. (How does Interproscan actually assign the Reactome data to proteins?)
@amcooksey
Copy link

Interproscan script fixed. Now adding Reactome annotations again.

@amcooksey
Copy link

Flybase pathways:
downloaded this file:
pathway_group_data_fb_2024_04.tsv

There are 17 parent pathways.
There are 73 unique entries in column 2 (I think those are the roles in the pathways).
There are 681 annotated genes

They don't cover many genes. Pathways seem pretty basic. I don't think there is anything here we can't get somewhere else.

We could probably anntoate to them in the same way we do GOanna/Kofamscan--BLAST/HMM match the genes to drosophila homologs and pull those annotations across.

@amcooksey
Copy link

Things I found to do KEGG annotations:
Kofamscan/ClusterProfiler combo:

  • kofamscan is the standalone version of KofamKoala from KEGG. Pulls KO number for each FASTA protein.
  • clusterprofiler is a Bioconductor tool with a KEGG mapper option. This takes the KO number and pulls the corresponding pathway annotations.

KEGG API:
Can pull a variety of tables that get the annotations for any species in KEGG. For other species we would still need to find homologs in KEGG to pull annotations from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants