-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First version #1
base: master
Are you sure you want to change the base?
Conversation
see ticket https://jira.oicr.on.ca/browse/GRD-833 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a first implementation I would strongly recommend to avoid using separate wdl files. It will add additional maintenance headaches, especially given that the files are almost identical. I see that in this case we could introduce a mode variable which would route the execution two one of two tasks - one for PERL and another for R mode. We had a similar approach implemented in bclconvert wf which allows choosing between hpc and dragen modes.
I'm also inclined to choose and implement just one of the versions, perl or java. I'm working with CHUM now to get their vardict output and we can run on a test sample to see what each version generates. I'll also investigate a bit more to better understand if it is clear that a specific version is being used in the genpipes pipelines that CHUM is using |
Yes we are in the process of choosing one of perl or java version. For a small inputs they generate indentical output, the problem is for large inputs they both have outof memory issue. I have tested java multiple times increasing memory, last attempt used 512G job memory with > 300G inputs from CHUM, still failed because of memory after 60h 32m. |
The perl version now removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved, but recommend modifying calculate.sh test script
include the parameter i tested this on one small dataset and it reduced the processing time from |
even with threading, this runs very long. for vardict, and for the neoantigen pipeline in particular, we should be okay to exclude intergenic regions I tested this out with a bedfile from UCSC knowngenes, splitting the bed file by chromosome, and a simplified command. test command was my tests were with knownGene , but i think we can use the same bed file that is being used for TMB assessment |
Trying to run vardict with -th 8 and uses UCSC knowngene bed file, seeing speed much faster, chrY completed in 28 minutes, versus 33h 22m last time. Still running though. |
No description provided.