Skip to content

Commit 5be0a99

Browse files
ajlee21Jake Crawford
andauthored
Minor update to readme and figure (#46)
* minor fix to figure output * update readme * minor fixes to readme * more minor fixes * fix typo Co-authored-by: Jake Crawford <[email protected]> * add note about windows OS Co-authored-by: Jake Crawford <[email protected]>
1 parent c089f73 commit 5be0a99

File tree

5 files changed

+362
-513
lines changed

5 files changed

+362
-513
lines changed

README.md

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,45 @@
1-
# Rank pathways analysis
1+
# Detecting generic gene expression signals
22

3-
**Alexandra J Lee, James C Costello and Casey S Greene**
3+
**Alexandra J. Lee, Rani K. Powers, Dallas L. Mould, Dongbo Hu, Georgia Doing, Jake Crawford, James C. Costello, Deborah A. Hogan, Casey S. Greene**
44

5-
**May 2020**
6-
7-
**University of Pennsylvania, University of Colorado Anschutz Medical Campus**
5+
**University of Pennsylvania, University of Colorado Anschutz Medical Campus, Dartmouth College**
86

97
**Rationale**: People performing differential expression (DE) analysis found that some genes and subsequent pathways are more likely to be differentially expressed even across a wide range of experimental designs ([Powers et. al., Bioinformatics 2018](https://academic.oup.com/bioinformatics/article/34/13/i555/5045793 ); [Crow et. al., PNAS 2019](https://www.pnas.org/content/116/13/6491)).
108

119
Given that there exist these commonly DE genes and subsequent pathways, it is important to be able to distinguish between genes that are generic versus experiment or condition-specific. These specific genes may point to, say, those genes that are disease-specific and may reveal new insights into pathogenic mechanisms that might have gotten overlooked by examining all the DE genes in aggregate. And these disease-specific genes can also help to prioritize DEGs for follow-up wet experiments/functional experiments. For example, [Swindell et. al.](https://www.sciencedirect.com/science/article/pii/S0022202X16312465#fig3) identified IL-17A as an inducer of DEGs most uniquely elevated in psoriasis lesions compared to other skin diseases. Furthermore, clinical data demonstrating efficacy of anti-IL-17A therapy for moderate-to-severe psoriasis. In general being able to distinguish between generic vs context-specific signals is important to learning gene function and revealing insights into mechanism of disease.
1210

1311

1412
**Challenge**:
15-
Current methods, including Powers et. al. and Crow et. al., to identify generic genes and pathways rely on manual curation. This curation effort included collecting a large set of samples with corresponding metadata, process data and perform DE analysis to get ranked list of genes.
13+
Current methods, including Powers et. al. and Crow et. al., to identify generic genes and pathways rely on manual curation. This curation effort included collecting a large set of samples with corresponding metadata, processing data and performing DE analysis to get ranked list of genes.
1614

17-
If you want to perform a new DE analysis in a different biological **context** (i.e. different organism, tissue, media) then you might not have the curated data available. Switching contexts will require a lot of manual effort. Similarly, using a different statistical method will require re-curation effort
15+
If you want to perform a new DE analysis in a different biological **context** (i.e. different organism, tissue, media) then you might not have the curated data available. Switching contexts will require re-curation. Similarly, using a different statistical method will require re-curation. This curation effort is very time intensive.
1816

1917

20-
**Goal of this study:**
21-
* To show that our compendia simulation method, [ponyo](https://github.com/greenelab/ponyo) can automatically identify generic genes and pathways
18+
**Goal:**
19+
To develop a method that can automatically identify generic genes and pathways
2220

2321
**Results:**
24-
* We found a set of general generic genes (i.e. genes found to be generic in both recount2 and crow et. al., which contain a mix of experiments)
25-
* We developed a method to automatically identify generic genes in different contexts without having to perform experiments and curate.
22+
Our method ranking was consistent with previously published ranking. These generic genes appear to act as gene hubs, which are associated with many biological processes.
23+
24+
**Conclusions:**
25+
We developed a method to automatically identify generic genes and pathways using public data without the need for curation. The generic signals identified from this method can be used to interpret study results and direct follow-up experiments.
26+
27+
## Directory Structure
28+
| Folder/file | Description |
29+
| --- | --- |
30+
| [configs](configs) | This folder contains configuration files used to set hyperparameters for the different experiments |
31+
| [generic_expression_patterns_modules](generic_expression_patterns_modules) | This folder contains supporting functions that other notebooks in this repository will use |
32+
| [human_cancer_analysis](human_cancer_analysis) | This folder contains analysis notebooks to identify generic signals using Powers et. al. dataset to train VAE |
33+
| [human_general_analysis](human_general_analysis) | This folder contains analysis notebooks to identify generic signalsusing [recount2](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742427/) dataset to train VAE |
34+
| [multiplier_analysis](multiplier_analysis) | This folder contains analysis notebooks to coverage of generic genes across [MultiPLIER latent variables](https://www.cell.com/cell-systems/pdfExtended/S2405-4712\(19\)30119-X)) |
35+
| [pseudomonas_analysis](pseudomonas_analysis) | This folder contains analysis notebooks to identify generic signalsusing *P. aeruginosa* dataset to train VAE |
36+
2637

38+
## Usage
2739

28-
## How to run notebooks from generic-expression-patterns
40+
**How to run notebooks from generic-expression-patterns**
2941

30-
**Operating Systems:** Mac OS, Linux
42+
*Operating Systems:* Mac OS, Linux (Note: bioconda libraries not available in Windows)
3143

3244
In order to run this simulation on your own gene expression data the following steps should be performed:
3345

@@ -60,7 +72,7 @@ pip install -e .
6072
* The processed template file can be found [here](human_analysis/data/processed_recount2_template.tsv)
6173
* The scaler file can be found [here](human_analysis/data/scaler_transform_human.pickle)
6274

63-
## How to run using your own data
75+
**How to analyze your own data**
6476

6577
In order to run this simulation on your own gene expression data the following steps should be performed:
6678

@@ -132,3 +144,5 @@ Note: Some of these parameters are required by the imported [ponyo](https://gith
132144
| num_simulated| int: Simulate a compendia with these many experiments, created by shifting the template experiment these many times|
133145
| num_recount2_experiments_to_download** | int: Number of recount2 experiments to download. Note this will not be needed when we update the training to use all of recount2|
134146

147+
## Acknowledgements
148+
We would like to thank David Nicholson, Ben Heil, Jake Crawford, Georgia Doing and Milton Pividori for insightful discussions and code review

0 commit comments

Comments
 (0)