In overview, BETS:
- takes as input (or multiple replicates of) gene matrix and a randomized version of the gene matrix, where each gene's expression has been permuted over time.
- creates a "run" folder where the input and code are copied.
- runs the scripts in parallel
- outputs a network with FDR thresholding performed based off of bootstrap frequencies.
BETS takes in a tab-delimited gene by timepoint file. It should have the first column be the genenames, and the columns after be the ordered timepoints (left is earlier, right is later). See data/DREAM/insilico/0mean/insilico_size100_1_0mean_TS-rep-1.txt
as an example.
gene time1 time2 time3 ...
geneA 0.1 0.1 -0.2 ...
geneB 0.3 -0.2 -0.1 ...
...
If there are multiple replicates, you instead provide a text file with list of each of the individual replicate files. See data/DREAM/insilico_size100_1/0mean/reps.txt
as an example: it lists individual replicate files like data/DREAM/insilico/0mean/insilico_size100_1_0mean_TS-rep-1.txt
and data/DREAM/insilico/0mean/insilico_size100_1_0mean_TS-rep-2.txt
. Each individual replicate files have the same format as above.
BETS treats replicates as independent samples, so please make sure your replicates are measured at the same timepoints and list genes in the same order.
python3 -m pip install -r requirements.txt
- Get to the BETS directory from the command line.
- Set the parameters at
code/package_params_cpipeline.sh
- (OPTIONAL) If you want to run on a computing cluster, modify
code/run_all_parallel_wait.sh
so that it submits jobs appropriately.
cd code/
source ./package_params_cpipeline.sh
./package_for_cluster_cpipeline.sh
cd $FOLDER
Assuming you are in the run folder, a shortcut to run all of the following parts of step 3A-E automatically on your computer without having to intervene step-by-step is:
./run_BETS_no_cluster.sh
source ./package_params_cpipeline.sh
./prep_jobs_rand_cv.sh
./prep_jobs_bootstrap.sh
- Set the list of scripts to run from.
export scriptlist=cv_parallel_script_list.txt
- If on your own computer, do
./run_all_parallel_no_cluster.sh
If submitting jobs to cluster, do./run_all_parallel_wait.sh
- Wait for the jobs to complete.
(you can ignore the
line 3: module: command not found
errors) - Set the hyperparameter for the fit.
./set_hyper.sh
source ./package_params_cpipeline.sh
export scriptlist=fit_parallel_script_list.txt
- If on your own computer, do
./run_all_parallel_no_cluster.sh
. If submitting jobs to cluster, do./run_all_parallel_wait.sh
- Wait for the jobs to complete.
(you can ignore the
line 3: module: command not found
errors) ./finish-effect.sh
source ./package_params_cpipeline.sh
export scriptlist=bootstrap_parallel_script_list.txt
- If on your own computer, do
./run_all_parallel_no_cluster.sh
. If submitting jobs to cluster, do./run_all_parallel_wait.sh
- Wait for the jobs to complete.
(you can ignore the
line 3: module: command not found
errors) export scriptlist=finish-effect-bootstrap_parallel_script_list.txt
- If on your own computer, do
./run_all_parallel_no_cluster.sh
. If submitting jobs to cluster, do./run_all_parallel_wait.sh
- Wait for the jobs to complete.
(you can ignore the
line 3: module: command not found
errors) - Combine the bootstrap elastic net fits.
./get_result_bootstrap_lite.sh
- Combine the significant networks for each bootstrap sample.
./get_result_bootstrap-fdr-0.05-effect_lite.sh
- Put all the timing results together now that it's done.
./summarize_time.sh
- Organize the results.
./downstream_prep.sh
- All the results are now under
run_l-fdr
- Edit
package_params_cpipeline.sh
:
A. replace your DATAFILE
with a version of DATAFILE
where every gene's temporal profile has been independently shuffled across time, separately for distinct replicates. In this example, change
export DATAFILE=../data/DREAM/insilico_size100_1/0mean/reps.txt
to
export DATAFILE=../data/DREAM/insilico_size100_1/0mean/reps-urand.txt
(note this permuted data set is distinct from RANDDATAFILE=../data/DREAM/insilico_size100_1/0mean/reps-rand.txt
. Both are generated in the same way, but with different random seeds.)
B. add as a suffix of _urand
to GENES. In this example, change
export GENES=insilico_size100_1
to
export GENES=insilico_size100_1_urand
- Run sections 1-3, above, as before.
- Change into the run directory with the original data as in Step 3.
In this example, change into BETS/runs/insilico_size100_1-0mean-reps-enet-2-g
- Modify
get_umbrella_null_results.py
urand_run_folder = "../URANDFOLDERNAME/run_l-fdr"
In this example, change to urand_run_folder = "../insilico_size100_1_urand-0mean-reps-enet-2-g/run_l-fdr"
-
Do
python3 get_umbrella_null_results.py
-
The FDR-thresholded network is now available at
analysis/bootstrap-fdr-0.2-network.txt
and the bootstrap frequency threshold is available atanalysis/bootstrap-fdr-0.2-network-umbrella-results.csv
In this example, you should get a network with 3364 edges.
Reach out at the Google Group!