Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Add notebook which integrates metag, metap, metab data #124

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

bmeluch
Copy link
Contributor

@bmeluch bmeluch commented Feb 5, 2025

This PR will add a new notebook to the repo that connects metag, metap, and metab data from the same samples together. It relies heavily on the KEGG orthology annotations provided by the NMDC workflows for each of these data types.

Links

nbviewer https://nbviewer.org/github/microbiomedata/nmdc_notebooks/blob/91-create-notebook-integrating-metag-metap-metab-data-r/omics_types_integration/R/integration_notebook.ipynb

colab https://colab.research.google.com/github/microbiomedata/nmdc_notebooks/blob/91-create-notebook-integrating-metag-metap-metab-data-r/omics_types_integration/R/integration_notebook.ipynb


All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Does your PR link to an issue?
  • Have you described the changes this PR will make?

New Notebook Submissions:

  • Have you included a summary of the notebook in the README.md included updated links to the notebook?
  • Does your PR include links to the new notebook (in the branch) for review using nbviewer, Colab, and reviewnb? These three are the preferred ways to review changes and additions to notebooks during review.
  • Does your PR include a test in a github workflow that tests the render-ability of your notebook?

@bmeluch bmeluch linked an issue Feb 5, 2025 that may be closed by this pull request
7 tasks
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bmeluch
Copy link
Contributor Author

bmeluch commented Feb 5, 2025

@samobermiller I got the first part of the notebook rendered - does it look ok in reviewNB? and does it make sense so far? thank you :)

@@ -0,0 +1,730 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest rewriting the title to something more action based. Maybe 'How can we relate different types of omics data in the NMDC database?'

see my later note about looking at crossover in kegg ids before using the kegg api, if you decide to follow that idea maybe add to the beginning of your note 'NOTE: After finding overlap in KO identifications across omics types, this notebook uses the KEGGREST R package to interface with the KEGG API and determine the biological relevance of these identifications. Use of............' My thinking is this will underline that there is a use for your notebook even if you dont use the licensed packages


Reply via ReviewNB

@@ -0,0 +1,730 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'(upset plot)'? typo?


Reply via ReviewNB

@@ -0,0 +1,730 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the KEGG api and package are restricted, would it make sense to add something beforehand looking at the crossover in kegg ids identified between the metagenomic and metaproteomic data? i know it won't show biological relevance until you use the licensed api to pull info, but it could show the use of your notebook even without the licensed api? 'here is what you can do to see overlap in multi-omic identifications and if you'd like more information on their biological relevance, you can follow the below tutorial using the kegg api but it will require a licence blah blah blah'


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add clarifying sentence along the lines of:

'In this case, we want to look at the processed data results for our three omics types of interest. Specifically, we want the files containing KEGG Orthology and Enzyme Commission annotations. Note that these annotations are available in separate files for the metagenomics data (in 'Annotation Enzyme Commission' and 'Annotation KEGG Orthology') and in combined files for the metabolomic ('GC-Metabolomics Results') and proteomic ('Protein Report') data.


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by 'we can look up the corresponding annotations in other KEGG databases'...what are 'other KEGG databases'?

Are you saying there is more than place to search for corresponding annotations using the KEGG ids found in this data? Based on the rest of the notebook I think what you're saying is that you're gonna look up annotation information for the KEGG ids in a sample's metabolomic, proteomic and genomic data, then compare overlap of the annotation information between the three omics types?

this confusion is probably due to my own lack of background knowledge, but i think it could use some clarification


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a circos plot and what are sector/arc?


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a bit more context on how to interpret the chord diagram and maybe an example observation. also could you clarify what you mean by 'connections'? is it the overlap in annotations? to me it looks like the figure is showing nearly all protein annotations and about half the metabolomic annotations are also seen in the gene annotations. does overlap of all three arcs mean anything like it does in a venn diagram?


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great explanation


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO COOL! Are the biosamples being grouped or clustered by anything?


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this figure the same as the one above but with color and human readable names? if so, i would maybe combine the code and just keep this figure


Reply via ReviewNB

@@ -0,0 +1,1100 @@
{
Copy link
Collaborator

@samobermiller samobermiller Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cool, but if you're feeling short on time I think you are already generating some cool figures and we could keep this as a stretch goal for the symposium


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create notebook integrating metag, metap, metab data (R)
2 participants