oicr-gsi · pruzanov · Jan 21, 2025 · Jan 21, 2025 · Feb 10, 2025 · Feb 11, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,16 +1,40 @@
-## 1.3.0 - 2024-06-25
+# Changelog
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [1.4.0] - 2025-02-12
+### Changed
+- Regression testing adjusted, validates in Jenkins
+- default parameters adjusted according to [GBS-5046](https://jira.oicr.on.ca/browse/GBS-5046)
+- changed code that generates input list
+
+## [1.3.0] - 2024-06-25
+### Added
 [GRD-797](https://jira.oicr.on.ca/browse/GRD-797) - add vidarr labels to outputs (changes to medata only)
-## 1.2.2 - 2024-04-09
+
+## [1.2.2] - 2024-04-09
+### Changed
 - Updated to a reference built using hg38/p12
-## 1.2.1 - 2024-04-04
+
+## [1.2.1] - 2024-04-04
+### Changed
 - Changed names in vidarrbuild.json
-## 1.2.0 - 2024-03-26
-- Workflow requires an array of fastq files with read-groups as input. A single fastq-pair (or a single fastq file) must also be inputted as an array. 
-- runDragen task outputs a merged bam file
+
+## [1.2.0] - 2024-03-26
+### Added
 - Added new task makeCSV and headerFormat, and removed readGroupFormat task
+### Changed
+- Workflow requires an array of fastq files with read-groups as input. A single fastq-pair (or a single fastq file) must also be inputted as an array.
+- runDragen task outputs a merged bam file
 - Replaced mode parameter with isRNA parameter (false by default)
-## 1.1.0 - 2024-03-05
+
+## [1.1.0] - 2024-03-05
+### Changed
 - Changes the zippedOut output into a zipped directory. This ensures that extraction creates a new directory instead of tarbombing the working directory.
-## 1.0.0 - 2024-03-04
+
+## [1.0.0] - 2024-03-04
+### Added
 - Completes lane level alignments using Dragen
 - Supports whole transcriptome alignment
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ This workflow will align sequence data (WG or WT) provided as fastq files to the
 ## Dependencies
 
 * [dragen](https://developer.illumina.com/dragen)
-
+* [gsi modules : dragen-scripts 0.3](https://gitlab.oicr.on.ca/ResearchIT/modulator)
 
 ## Usage
 
@@ -36,10 +36,14 @@ Parameter|Value|Default|Description
 #### Optional task parameters:
 Parameter|Value|Default|Description
 ---|---|---|---
-`headerFormat.jobMemory`|Int|1|Memory allocated for this job
-`headerFormat.timeout`|Int|5|Hours before task timeout
-`makeCSV.jobMemory`|Int|1|Memory allocated for this job
-`makeCSV.timeout`|Int|5|Hours before task timeout
+`extractInfoLine.parsingScript`|String|"$DRAGEN_SCRIPTS_ROOT/bin/composeList.py"|Script for parsing inputs into a line
+`extractInfoLine.timeout`|Int|4|Timeout for the job
+`extractInfoLine.jobMemory`|Int|4|Job allocated RAM
+`extractInfoLine.modules`|String|"dragen-scripts/0.1"|dependency modules
+`composeList.listWritingScript`|String|"$DRAGEN_SCRIPTS_ROOT/bin/writeFile.py"|Script for writing out list of inputs
+`composeList.jobMemory`|Int|4|Job allocated RAM
+`composeList.timeout`|Int|4|Timeout for the job
+`composeList.modules`|String|"dragen-scripts/0.1"|dependency modules
 `runDragen.adapter1File`|String|"/staging/data/resources/ADAPTER1"|Adapters to be trimmed from read 1
 `runDragen.adapter2File`|String|"/staging/data/resources/ADAPTER2"|Adapters to be trimmed from read 2
 `runDragen.jobMemory`|Int|500|Memory allocated for this job
@@ -50,10 +54,10 @@ Parameter|Value|Default|Description
 
 Output | Type | Description | Labels
 ---|---|---|---
-`bam`|File|Output bam aligned to genome|
-`bamIndex`|File|Index for the aligned bam|
-`zippedOut`|File|Zip file containing the supporting .csv and .tab outputs from Dragen|
-`outputChimeric`|File?|Output chimeric junctions file, if available|
+`bam`|File|BAM file with alignments|vidarr_label: bam
+`bamIndex`|File|index of BAM file with alignments|vidarr_label: bamIndex
+`zippedOut`|File|Zipped .csv and .tab files (additional outputs)|vidarr_label: zippedOut
+`outputChimeric`|File?|Optional output file with chimeric junctions|vidarr_label: outputChimeric
 
 
 ## Commands
@@ -63,64 +67,15 @@ This section lists command(s) run by dragenAlign workflow
 
 ### Ensures the read-group information is valid, and outputs a header for the input CSV.
 
-``` 
-     set -euo pipefail 
-
-     headerString="Read1File,Read2File"
-
-     # Split the string into an array of key-value pairs
-     IFS=, read -ra rgArray <<< ~{readGroupString}
-
-     # Adds valid keys (for Dragen) to headerString
-     for field in "${rgArray[@]}"; do
-       tag=${field:0:5}
-       if [ "$tag" == "RGID=" ] || [ "$tag" == "RGLB=" ] || [ "$tag" == "RGPL=" ] || \
-          [ "$tag" == "RGPU=" ] || [ "$tag" == "RGSM=" ] || [ "$tag" == "RGCN=" ] || \
-          [ "$tag" == "RGDS=" ] || [ "$tag" == "RGDT=" ] || [ "$tag" == "RGPI=" ]
-       then
-         headerString+=",${field:0:4}"
-       else
-         # Redirect error message to stderr
-         echo "Invalid tag: '$tag'" >&2  
-         exit 1
-       fi
-     done
-
-     # Ensures the required header information is present
-     if [ "$(echo "$headerString" | grep -c "RGID")" != 1 ] || \
-        [ "$(echo "$headerString" | grep -c "RGSM")" != 1 ] || \
-        [ "$(echo "$headerString" | grep -c "RGLB")" != 1 ] || \
-        [ "$(echo "$headerString" | grep -c "RGPU")" != 1 ]; then
-       echo "Missing required read-group information from header" >&2  
-       exit 1
-     fi
-
-     echo "$headerString"
 ```
-
-### Format input CSV file for Dragen.
-
-``` 
-     set -euo pipefail 
-
-     echo ~{csvHeader} > ~{csvResult}
-
-     # Load arrays into bash variables
-     arrRead1s=(~{sep=" " read1s})
-     if ~{isPaired}; then arrRead2s=(~{sep=" " read2s}); fi
-     arrReadGroups=(~{sep=" " readGroups})
-
-     # Iterate over the arrays concurrently
-     for (( i = 0; i < ~{arrayLength}; i++ ))
-     do
-       read1="${arrRead1s[i]}"
-       if ~{isPaired}; then read2="${arrRead2s[i]}"; else read2=""; fi
-       readGroup=$(echo "${arrReadGroups[i]}" | sed 's/RG..=//g')
-       echo "$read1,$read2,$readGroup" >> ~{csvResult}
-     done
+     python3 ~{parsingScript} -i ~{write_json(fastqInput)}
 ```
 
-### Align to reference using Dragen.
+### Compose a list of inputs for dragen:wq
+
+```
+    python3 ~{listWritingScript} -o ~{outputFileName} -l "~{sep=';' inputLines}"
+```
 
 ```
      set -euo pipefail

diff --git a/commands.txt b/commands.txt
@@ -3,66 +3,14 @@ This section lists command(s) run by dragenAlign workflow
 
 * Running dragenAlign
 
-=== Ensures the read-group information is valid, and outputs a header for the input CSV ===.
 
-``` 
-    set -euo pipefail 
-
-    headerString="Read1File,Read2File"
-
-    # Split the string into an array of key-value pairs
-    IFS=, read -ra rgArray <<< ~{readGroupString}
-
-    # Adds valid keys (for Dragen) to headerString
-    for field in "${rgArray[@]}"; do
-      tag=${field:0:5}
-      if [ "$tag" == "RGID=" ] || [ "$tag" == "RGLB=" ] || [ "$tag" == "RGPL=" ] || \
-         [ "$tag" == "RGPU=" ] || [ "$tag" == "RGSM=" ] || [ "$tag" == "RGCN=" ] || \
-         [ "$tag" == "RGDS=" ] || [ "$tag" == "RGDT=" ] || [ "$tag" == "RGPI=" ]
-      then
-        headerString+=",${field:0:4}"
-      else
-        # Redirect error message to stderr
-        echo "Invalid tag: '$tag'" >&2  
-        exit 1
-      fi
-    done
-
-    # Ensures the required header information is present
-    if [ "$(echo "$headerString" | grep -c "RGID")" != 1 ] || \
-       [ "$(echo "$headerString" | grep -c "RGSM")" != 1 ] || \
-       [ "$(echo "$headerString" | grep -c "RGLB")" != 1 ] || \
-       [ "$(echo "$headerString" | grep -c "RGPU")" != 1 ]; then
-      echo "Missing required read-group information from header" >&2  
-      exit 1
-    fi
-
-    echo "$headerString"
 ```
-
-=== Format input CSV file for Dragen ===.
-
-``` 
-    set -euo pipefail 
-
-    echo ~{csvHeader} > ~{csvResult}
-
-    # Load arrays into bash variables
-    arrRead1s=(~{sep=" " read1s})
-    if ~{isPaired}; then arrRead2s=(~{sep=" " read2s}); fi
-    arrReadGroups=(~{sep=" " readGroups})
-
-    # Iterate over the arrays concurrently
-    for (( i = 0; i < ~{arrayLength}; i++ ))
-    do
-      read1="${arrRead1s[i]}"
-      if ~{isPaired}; then read2="${arrRead2s[i]}"; else read2=""; fi
-      readGroup=$(echo "${arrReadGroups[i]}" | sed 's/RG..=//g')
-      echo "$read1,$read2,$readGroup" >> ~{csvResult}
-    done
+    python3 ~{parsingScript} -i ~{write_json(fastqInput)}
 ```
 
-=== Align to reference using Dragen ===.
+```
+   python3 ~{listWritingScript} -o ~{outputFileName} -l "~{sep=';' inputLines}"
+```
 
 ```
     set -euo pipefail
@@ -87,4 +35,4 @@ This section lists command(s) run by dragenAlign workflow
     mkdir ~{zipFileName}
     cp -t ~{zipFileName} $(ls | grep '~{prefix}.*.csv\|~{prefix}.*.tab' | tr '\n' ' ')
     zip -r ~{zipFileName}.zip ~{zipFileName}
-```
+```