First version #1

gavin-peng · 2025-01-02T19:13:41Z

No description provided.

gavin-peng · 2025-01-02T19:55:59Z

see ticket https://jira.oicr.on.ca/browse/GRD-833

pruzanov

Since this is a first implementation I would strongly recommend to avoid using separate wdl files. It will add additional maintenance headaches, especially given that the files are almost identical. I see that in this case we could introduce a mode variable which would route the execution two one of two tasks - one for PERL and another for R mode. We had a similar approach implemented in bclconvert wf which allows choosing between hpc and dragen modes.

lheisler · 2025-01-09T20:51:48Z

I'm also inclined to choose and implement just one of the versions, perl or java. I'm working with CHUM now to get their vardict output and we can run on a test sample to see what each version generates. I'll also investigate a bit more to better understand if it is clear that a specific version is being used in the genpipes pipelines that CHUM is using

gavin-peng · 2025-01-10T15:35:54Z

Yes we are in the process of choosing one of perl or java version. For a small inputs they generate indentical output, the problem is for large inputs they both have outof memory issue. I have tested java multiple times increasing memory, last attempt used 512G job memory with > 300G inputs from CHUM, still failed because of memory after 60h 32m.
The vardict github page claims java version is 10X faster, for now I haven't observed huge difference in speed.

gavin-peng · 2025-01-20T18:14:29Z

Since this is a first implementation I would strongly recommend to avoid using separate wdl files. It will add additional maintenance headaches, especially given that the files are almost identical. I see that in this case we could introduce a mode variable which would route the execution two one of two tasks - one for PERL and another for R mode. We had a similar approach implemented in bclconvert wf which allows choosing between hpc and dragen modes.

The perl version now removed

pruzanov

I approved, but recommend modifying calculate.sh test script

tests/calculate.sh

vardict.wdl

lheisler · 2025-02-10T15:35:20Z

include the parameter
-th 8
which will give opportunity for the software to use multiple slots on the node

i tested this on one small dataset and it reduced the processing time from
166m to 33m

lheisler · 2025-02-10T15:45:35Z

even with threading, this runs very long.

for vardict, and for the neoantigen pipeline in particular, we should be okay to exclude intergenic regions

I tested this out with a bedfile from UCSC knowngenes, splitting the bed file by chromosome, and a simplified command.
jobs completed in the range of 3 m to 1258 m (20 h)

test command was
varDict -G REF -N MoHQ-CM-1-180.DT -b 'Tbam|Nbam' -c 1 -S 2 -E 3 -g 4 -th 8 -D -y knownGene.chr.bed
(the above is extracted from the current workflow command, which pipes to additional filtering commands. i removed the pipes for testing purposes)

my tests were with knownGene , but i think we can use the same bed file that is being used for TMB assessment

https://bitbucket.oicr.on.ca/projects/GSI/repos/interval-files/browse/accredited/MANE_Select_v1.3.bed

gavin-peng · 2025-02-10T19:21:51Z

Trying to run vardict with -th 8 and uses UCSC knowngene bed file, seeing speed much faster, chrY completed in 28 minutes, versus 33h 22m last time. Still running though.

Gavin Peng added 13 commits December 18, 2024 19:27

initial wdl

785b5ec

small fix

3500445

add perl version

a9a64b0

small syntax change

0e231d7

yet more syntax test

194ae6f

remove some parameters for testing

b28fafb

test simplest case

4c6ea2e

syntax fix, add README

a1712f1

perl version

fa18b27

add set pattern in bash script

6681936

add java Xmx option

4dad3ef

add vidarr files

956c169

add missing argument

7062e62

gavin-peng requested review from lheisler and pruzanov January 2, 2025 19:56

add samtools as dependency of perl version

ceb2dfe

pruzanov requested changes Jan 9, 2025

View reviewed changes

Gavin Peng added 8 commits January 15, 2025 22:47

try parallellizing runVardict

7f5dc64

add creating dictionary for mergeVcfs and remove perl version

54a7f53

add parameters like refDict

e86a043

change to split bed instead of reference

da7c31a

add memory allocating based on bed size

041a944

update readme, regression test

84ceaf9

add missing parameters

1391937

fix regression test parameter name

b83eb13

gavin-peng closed this Jan 20, 2025

gavin-peng reopened this Jan 20, 2025

gavin-peng requested a review from pruzanov January 20, 2025 18:16

pruzanov approved these changes Jan 21, 2025

View reviewed changes

tests/calculate.sh Outdated Show resolved Hide resolved

memory and time parameters adjustment

a79004d

lheisler reviewed Feb 6, 2025

View reviewed changes

vardict.wdl Outdated Show resolved Hide resolved

Gavin Peng added 3 commits February 10, 2025 15:25

add numThreads, remove vcf header in calculate.sh

a1354e7

fix typo

a46d740

fix syntax

c98eed2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First version #1

First version #1

gavin-peng commented Jan 2, 2025

gavin-peng commented Jan 2, 2025

pruzanov left a comment

lheisler commented Jan 9, 2025

gavin-peng commented Jan 10, 2025

gavin-peng commented Jan 20, 2025

pruzanov left a comment

lheisler commented Feb 10, 2025

lheisler commented Feb 10, 2025 •

edited

Loading

gavin-peng commented Feb 10, 2025

First version #1

Are you sure you want to change the base?

First version #1

Conversation

gavin-peng commented Jan 2, 2025

gavin-peng commented Jan 2, 2025

pruzanov left a comment

Choose a reason for hiding this comment

lheisler commented Jan 9, 2025

gavin-peng commented Jan 10, 2025

gavin-peng commented Jan 20, 2025

pruzanov left a comment

Choose a reason for hiding this comment

lheisler commented Feb 10, 2025

lheisler commented Feb 10, 2025 • edited Loading

gavin-peng commented Feb 10, 2025

lheisler commented Feb 10, 2025 •

edited

Loading