-
Notifications
You must be signed in to change notification settings - Fork 13
3.1. Manual running JUM (v2.0.2 and up)
- Under your working folder (
/user/home/AS_analysis
), you should have the following input files ready for JUM analysis ("*" indicates ctrl1, ctrl2, ctrl3, treat1, treat2 and treat3):-
*Aligned.out.sam
files -
*SJ.out.tab
files -
*Aligned.out_sorted.bam
files
-
Let us also assume that you have downloaded the JUM software package version 2.0.2 under your home directory and unzipped, so all the scripts are now in /user/home/JUM_2.0.2
).
-
Run
JUM_A.sh
:$ bash /user/home/JUM_2.0.2/JUM_A.sh --Folder "directory" --JuncThreshold "junction_read_threshold" --Condition1_fileNum_threshold "thre_file_num_1" --Condition2_fileNum_threshold "thre_file_num_2" --IRthreshold "IR_read_threshold" --Readlength "read_length" --Thread "thread_num" --Condition1SampleName "condition_1sample" --Condition2SampleName "condition_2sample" #--Folder: path of the downloaded JUM package. #--JuncThreshold: (for junction filtering) JUM will filter for splice junctions that receive more than this # of unique mapped reads in at least #ConditionX_fileNum_threshold samples out of all replicates under each condition as valid junctions for downstream analysis. #--Condition1_fileNum_threshold: (for junction filtering) JUM will filter for splice junctions that receive more than #JuncThreshold unique mapped reads in at least this # samples out of all replicates under condition 1 as valid junctions identified in condition 1 for downstream analysis. #--Condition2_fileNum_threshold: (for junction filtering) JUM will filter for splice junctions that receive more than #JuncThreshold unique mapped reads in at least this # samples out of all replicates under condition 2 as valid junctions identified in condition 2 for downstream analysis. #--IRthreshold - IR filter: JUM will filter for IR events that receive more than this # of unique mapped reads in the upstream exon-intron and downstream intron-exon boundaries as potential true IR events. #--Readlength: the length of the RNA-seq reads #--Thread: number of threads for multi-threading processing of sam/bam files #--Condition1SampleName: the prefix/name of the samples under condition 1, separated by "," (The prefix for *Aligned.out.sam files). #--Condition2SampleName: the prefix/name of the samples under condition 2, separated by ",".
for example:
bash /user/home/JUM_2.0.2/JUM_A.sh --Folder /user/home/JUM_2.0.2 --JuncThreshold 5 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --IRthreshold 5 --Readlength 100 --Thread 3 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3
NOTES
-
For
#--JuncThreshold
and#--ConditionX_fileNum_threshold
, users need to choose based on the RNA-seq sequencing depth and number of replicates they have for each condition. Below are a few examples and general rule of thumb:- If users have 3 replicates that are relatively deeply sequenced (~30M+ for drosophila and ~50M+ for human samples, for example), it is reasonable to filter for junctions that have more than 5 (or 10) reads in 2 replicates of one condition, or all 3 replicates of one condition.
- If users only have two replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in both replicates of one condition.
- If users have 4 or even more replicates, it is reasonable to filter for junctions that have more than 5 (or 10) reads in at least (total # replicates - 1) samples of one condition.
-
users can choose different
#--ConditionX_fileNum_threshold
for each condition, depends on how many replicates each condition has. -
It is recommended that users name their samples in a simple manner, with condition plus a number, such as control1, control2, drug1 drug2 etc., and choose the conditions as condition 1 and 2 based on the alphabetic order of the sample names. For example, here "control" are alphabetically before "treated", so samples under control condition are set to be condition 1.
-
For
#--Thread
this will be the number of threads used to treat each sample, and all samples will be processed in parallel. So you may want to choose this number carefully as the total number of threads requested for running JUM.A will be#--Thread
multiply with the number of total samples. -
JUM_A.sh will generate a new folder called
JUM_diff/
with results for downstream analysis.
-
-
Enter the
JUM_diff/
folder and run the R script with a user-provided experiment design file (txt format; a template is provided in the package) for differential AS analysis.$ cd JUM_diff/ $ Rscript /user/home/JUM_2.0.2/R_script_JUM.R experiment_design.txt > outputFile.Rout 2> errorFile.Rout
An example experiment_design.txt file is shown below:
condition ctrl1 control ctrl2 control ctrl3 control treat1 treatment treat2 treatment treat3 treatment
NOTES
- It is important to make sure that in the experiment_design.txt file the sample naming and condition naming are in the same alphabetic order. For example, here control samples (ctrl1,2,3) all start with "c" so they are alphabatically before the treatment samples (treat1,2,3) that all start with "t"; accordingly, the condition name "control" for control samples is also alphabatically before the condition name "treatment" for treatment samples.
-
R_script_JUM.R
will output a file calledAS_differential.txt
. Make sure that Step 2 successfully generates a new file calledAS_differential.txt
in theJUM_diff/
folder. Otherwise you can refer to the filesoutputFile.Rout
anderrorFile.Rout
for troubleshooting.
-
Now run
JUM_B.sh
in theJUM_diff/
folder as follows:$ bash /user/home/JUM_2.0.2/JUM_B.sh --Folder "directory" --Test "pvalue|adjusted_pvalue" --Cutoff "stat_threshold" --TotalFileNum "total#samples" --Condition1_fileNum_threshold "thre_file_num_1" --Condition2_fileNum_threshold "thre_file_num_2" --Condition1SampleName "condition_1sample" --Condition2SampleName "condition_2sample" #--Folder: path of the downloaded JUM package. #--Test - choice of statistical measure for significance test: type "pvalue" or "adjusted_pvalue". #--Cutoff: a number, threshold for statistical cutoff. #--TotalFileNum: the number of total samples from all conditions adding together. #--Condition1_fileNum_threshold: same as specified in JUM_A.sh. #--Condition2_fileNum_threshold: same as specified in JUM_A.sh. #--Condition1SampleName: same as specified in JUM_A.sh. #--Condition2SampleName: same as specified in JUM_A.sh.
for example:
$ bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3
NOTES
- We recommend the users run at least one round using
pvalue 0.05
at this step. This is the most generous statistical setting to profile for significantly differentially spliced AS events and it will be handy to keep a version of this result around, especially when users are still experimenting with the optimal statistical cutoff. When in need of more strict statistical cutoffs, users can easily filter thepvalue 0.05
analysis results using simple commands of linux that searches for AS events satisfying more strict cutoffs, like adjusted pvalues and level of splicing changes, instead of running step 3 again. - If users would like to have an idea of the total number of AS events in the sample that can be detected by the sequencing depth at hand, they should run one round using
pvalue 1
. This setting will output every profiled AS events in all categories from the samples, be it changed between the two conditions or not. This will provide a complete atlas of all AS events detectable by the current sequencing depth. - For users that have multiple condition comparisons (e.g. control versus drug_treatment1 and drug_treatment2), it is also recommended that users run this step once with
pvalue 1
. This will facilitate comparisons of significantly changed AS events across different conditions versus control (e.g. changed AS events brought by drug_treatment1 compared to control versus changed AS events brought by drug_treatment2 compared to control). Users can easily filter the results for more strict statistical cutoffs with the final results run withpvalue 1
. Also, JUM output all AS events with their coordinates as IDs. So it will be convenient to compare the results under each condition comparison. - JUM_B.sh will generate a new folder called
FINAL_JUM_OUTPUT_$Test_$Cutoff
that contains the results for downstream analysis.
- We recommend the users run at least one round using
-
Run
JUM_C.sh
in the folderFINAL_JUM_OUTPUT_$Test_$Cutoff
as follows:$ cd FINAL_JUM_OUTPUT_$Test_$Cutoff $ bash /user/home/JUM_2.0.2/JUM_C.sh --Folder "directory" --Test "pvalue|adjusted_pvalue" --Cutoff "stat_threshold" --TotalCondition1FileNum "total_sample_#_condition_1" --TotalCondition2FileNum "total_sample_#_condition_2" --REF "refFlat" #--Folder: path of the downloaded JUM package. #--Test: same as in running JUM_B.sh. #--Cutoff: same as in running JUM_B.sh. #--TotalCondition1FileNum: the total number of samples under condition 1 that is alphabetically listed first in the `experiment_design.txt` file. #--TotalCondition2FileNum: the total number of samples under condition 2 that is alphabetically listed second in the `experiment_design.txt` file. #--REF: a `refFlat.txt` file (a type of transcriptome annotation file, also called genePred format.
For example:
$ bash /user/home/JUM_2.0.2/JUM_C.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalCondition1FileNum 3 --TotalCondition2FileNum 3 --REF refFlat.txt
NOTES
- JUM_C.sh will output files with the suffix:
*_final_simplified.txt
and*_final_detailed.txt
. These are the final output files from JUM. - The --REF refFlat.txt file should contain 11 columns that specify the following information respectively: 1-geneName 2-transcript_name 3-chrom 4-strand 5-txStart 6-txEnd 7-cdsStart 8-cdsEnd 9-exonCount 10-exonStarts 11-exonEnds). This file should be available from UCSC genome browser for different organisms and users can download it to the current working directory. If users only have other formats of annotation such as gff3 and gff files, users can easily convert these into the genePred format by binary scripts such as the
gff3ToGenePred
converter from the UCSC genome browser website: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/. Users may want to double check the converted format and make re-arrangement of columns so that the final format matches the refFlat format. Note, JUM does NOT depend on any priori knowledge of annotation to perform AS analysis. This file here is for associating the final differential AS results from JUM to known genes for the convenience of users. If an AS event is not mapped to any known gene, it will be marked as "NONE" in the associated gene track.
- JUM_C.sh will output files with the suffix: