|
1 |
| -# Rank pathways analysis |
| 1 | +# Detecting generic gene expression signals |
2 | 2 |
|
3 |
| -**Alexandra J Lee, James C Costello and Casey S Greene** |
| 3 | +**Alexandra J. Lee, Rani K. Powers, Dallas L. Mould, Dongbo Hu, Georgia Doing, Jake Crawford, James C. Costello, Deborah A. Hogan, Casey S. Greene** |
4 | 4 |
|
5 |
| -**May 2020** |
6 |
| - |
7 |
| -**University of Pennsylvania, University of Colorado Anschutz Medical Campus** |
| 5 | +**University of Pennsylvania, University of Colorado Anschutz Medical Campus, Dartmouth College** |
8 | 6 |
|
9 | 7 | **Rationale**: People performing differential expression (DE) analysis found that some genes and subsequent pathways are more likely to be differentially expressed even across a wide range of experimental designs ([Powers et. al., Bioinformatics 2018](https://academic.oup.com/bioinformatics/article/34/13/i555/5045793 ); [Crow et. al., PNAS 2019](https://www.pnas.org/content/116/13/6491)).
|
10 | 8 |
|
11 | 9 | Given that there exist these commonly DE genes and subsequent pathways, it is important to be able to distinguish between genes that are generic versus experiment or condition-specific. These specific genes may point to, say, those genes that are disease-specific and may reveal new insights into pathogenic mechanisms that might have gotten overlooked by examining all the DE genes in aggregate. And these disease-specific genes can also help to prioritize DEGs for follow-up wet experiments/functional experiments. For example, [Swindell et. al.](https://www.sciencedirect.com/science/article/pii/S0022202X16312465#fig3) identified IL-17A as an inducer of DEGs most uniquely elevated in psoriasis lesions compared to other skin diseases. Furthermore, clinical data demonstrating efficacy of anti-IL-17A therapy for moderate-to-severe psoriasis. In general being able to distinguish between generic vs context-specific signals is important to learning gene function and revealing insights into mechanism of disease.
|
12 | 10 |
|
13 | 11 |
|
14 | 12 | **Challenge**:
|
15 |
| -Current methods, including Powers et. al. and Crow et. al., to identify generic genes and pathways rely on manual curation. This curation effort included collecting a large set of samples with corresponding metadata, process data and perform DE analysis to get ranked list of genes. |
| 13 | +Current methods, including Powers et. al. and Crow et. al., to identify generic genes and pathways rely on manual curation. This curation effort included collecting a large set of samples with corresponding metadata, processing data and performing DE analysis to get ranked list of genes. |
16 | 14 |
|
17 |
| -If you want to perform a new DE analysis in a different biological **context** (i.e. different organism, tissue, media) then you might not have the curated data available. Switching contexts will require a lot of manual effort. Similarly, using a different statistical method will require re-curation effort |
| 15 | +If you want to perform a new DE analysis in a different biological **context** (i.e. different organism, tissue, media) then you might not have the curated data available. Switching contexts will require re-curation. Similarly, using a different statistical method will require re-curation. This curation effort is very time intensive. |
18 | 16 |
|
19 | 17 |
|
20 |
| -**Goal of this study:** |
21 |
| -* To show that our compendia simulation method, [ponyo](https://github.com/greenelab/ponyo) can automatically identify generic genes and pathways |
| 18 | +**Goal:** |
| 19 | +To develop a method that can automatically identify generic genes and pathways |
22 | 20 |
|
23 | 21 | **Results:**
|
24 |
| -* We found a set of general generic genes (i.e. genes found to be generic in both recount2 and crow et. al., which contain a mix of experiments) |
25 |
| -* We developed a method to automatically identify generic genes in different contexts without having to perform experiments and curate. |
| 22 | +Our method ranking was consistent with previously published ranking. These generic genes appear to act as gene hubs, which are associated with many biological processes. |
| 23 | + |
| 24 | +**Conclusions:** |
| 25 | +We developed a method to automatically identify generic genes and pathways using public data without the need for curation. The generic signals identified from this method can be used to interpret study results and direct follow-up experiments. |
| 26 | + |
| 27 | +## Directory Structure |
| 28 | +| Folder/file | Description | |
| 29 | +| --- | --- | |
| 30 | +| [configs](configs) | This folder contains configuration files used to set hyperparameters for the different experiments | |
| 31 | +| [generic_expression_patterns_modules](generic_expression_patterns_modules) | This folder contains supporting functions that other notebooks in this repository will use | |
| 32 | +| [human_cancer_analysis](human_cancer_analysis) | This folder contains analysis notebooks to identify generic signals using Powers et. al. dataset to train VAE | |
| 33 | +| [human_general_analysis](human_general_analysis) | This folder contains analysis notebooks to identify generic signalsusing [recount2](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742427/) dataset to train VAE | |
| 34 | +| [multiplier_analysis](multiplier_analysis) | This folder contains analysis notebooks to coverage of generic genes across [MultiPLIER latent variables](https://www.cell.com/cell-systems/pdfExtended/S2405-4712\(19\)30119-X)) | |
| 35 | +| [pseudomonas_analysis](pseudomonas_analysis) | This folder contains analysis notebooks to identify generic signalsusing *P. aeruginosa* dataset to train VAE | |
| 36 | + |
26 | 37 |
|
| 38 | +## Usage |
27 | 39 |
|
28 |
| -## How to run notebooks from generic-expression-patterns |
| 40 | +**How to run notebooks from generic-expression-patterns** |
29 | 41 |
|
30 |
| -**Operating Systems:** Mac OS, Linux |
| 42 | +*Operating Systems:* Mac OS, Linux (Note: bioconda libraries not available in Windows) |
31 | 43 |
|
32 | 44 | In order to run this simulation on your own gene expression data the following steps should be performed:
|
33 | 45 |
|
@@ -60,7 +72,7 @@ pip install -e .
|
60 | 72 | * The processed template file can be found [here](human_analysis/data/processed_recount2_template.tsv)
|
61 | 73 | * The scaler file can be found [here](human_analysis/data/scaler_transform_human.pickle)
|
62 | 74 |
|
63 |
| -## How to run using your own data |
| 75 | +**How to analyze your own data** |
64 | 76 |
|
65 | 77 | In order to run this simulation on your own gene expression data the following steps should be performed:
|
66 | 78 |
|
@@ -132,3 +144,5 @@ Note: Some of these parameters are required by the imported [ponyo](https://gith
|
132 | 144 | | num_simulated| int: Simulate a compendia with these many experiments, created by shifting the template experiment these many times|
|
133 | 145 | | num_recount2_experiments_to_download** | int: Number of recount2 experiments to download. Note this will not be needed when we update the training to use all of recount2|
|
134 | 146 |
|
| 147 | +## Acknowledgements |
| 148 | +We would like to thank David Nicholson, Ben Heil, Jake Crawford, Georgia Doing and Milton Pividori for insightful discussions and code review |
0 commit comments