takes SIMR Lims directory and send fastq from samples for alignment in parallel using SGE
batchBowtie [options]
[--bowtie bowtie_index]
[--directories flowcell_1 flowcell_2 ...]
[-- any extra parameter to pass to bowtie]
Options:
--help brief help message
--man full documentation
--destdir destination dir to save the results (Def: ~/batchAlignement/aln_timeStamp)
--sub_selection smaller list of sample to use
--excluded sample to remove from the job
--queue SGE queue to use (Def: all.q)
--dryrun print the jobs that will be sent to SGE
Every options names can be abreviated to their smaller unique value (ie: -dir/--directories, -de/--destdir)
- --directories
-
Path to the location of a flowcells containting the fastq and the .csv file describing the samples (Generated by the SIMR LIMS system). Multiple diretories can be passed (seperated by space)
- --bowtie
-
Path to the root of the bowtie index. If the environment variable BOWTIE_TX_INDEXES exists and points to the directory containing the bowtie indexes, only the name of the index can be provide
- --help
-
Print a brief help message and exits.
- --man
-
Prints the manual page and exits.
- --destdir
-
Name and location of the directory where the final bam files will be located
- --dryrun
-
Prints the jobs that will be run without exuting them
- --sub_selection
-
smaller list of sample to use
- --excluded
-
sample to remove from the job
- --queue
-
SGE queue to use (Def: all.q)
- --
-
Additional paramter to pass to bowtie. Will be taken literally. Need to be encloseed in double quotes with internal double quotes properly escaped. For instance:
-- "-N 2 -k5 --ignore-quals"
will be added as is to the bowtie parameter list. Currenlty, bowtie2 is run with no extra arguments.
This program will read the SIMR Lims generated directories of flowcells barcodes in search of a file with the .csv extension (the Sample_Report.csv and other iteration previously used by the lims). It will then use the sample name (in column 1) and associates the coressponding fastq file(s) (in column 3), one sample to many fastq files, even if located accross flowcells (as long as the flowcell directories are passed as argument to --directories).
Then, the fastq file(s) will be split in a tmp directory and aligned in parallel using bowtie2 using jobs sent to the SGE queue. The mulitple bam files will then be merged and the results will be saved indivually as .bam files using the sample named in hte .csv files The --dest-dir can be use to define a location to save the results, otherwise, the ./bowtieBatch_DD/MM/YY:H:M:S directory will be created and will store the final bam files.
Dry Run: Printing what will be run without runing it ~/scripts/aligner/batchBowtie2 --dryrun --bowtie Drosophila_melanogaster.BDGP5.71.min --dir /n/analysis/Blanchette/sha/MOLNG-61/C05HTACXX /n/analysis/Blanchette/sha/MOLNG-61/C0K08ACXX --destdir Sex_n_Tudor'
Running: ~/scripts/aligner/batchBowtie2 --bowtie Drosophila_melanogaster.BDGP5.71.min --dir /n/analysis/Blanchette/sha/MOLNG-61/C05HTACXX /n/analysis/Blanchette/sha/MOLNG-61/C0K08ACXX --destdir Sex_n_Tudor'
debugging ~/scripts/aligner/batchBowtie2 --bowtie Drosophila_melanogaster.BDGP5.73.min --dir /n/analysis/Blanchette/sha/MOLNG-61/C05HTACXX --destdir temp --debug
~/scripts/aligner/batchBowtie2 --bowtie Drosophila_melanogaster.BDGP5.73.min --dir test --destdir temp