Skip to content

storeDir does not skip staging of inputs #5468

Closed
@anoronh4

Description

@anoronh4

Bug report

Expected behavior and actual behavior

I am expecting storeDir to help skip processes that are already performed. However I am finding that it does not skip staging of the input files when they are remote. If the input file is very large, it will delay the downstream steps of the pipeline to download something that takes up space on the local disk and isn't necessarily used, because the task is skipped.

Steps to reproduce the problem

process A {
cache true
storeDir "storeDir"
input:
tuple val(meta), path(x)
output:
path(x)
script:
"""
echo hi
"""
}

workflow {
remotepath="https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.10/Mouse_GRCm39_M31_CTAT_lib_Nov092022.source.tar.gz"
A([[:],remotepath])
}

Program output

If I run for the first time it will download and perform the task, as expected. If I immediately re-run with -resume it will skip the process immediately and the pipeline will complete very quickly, also as expected. However, when i remove the folder work/stage-* and then try to resume, I get this:

N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [nauseous_archimedes] DSL2 - revision: cc0b1bbdb5
[skipped  ] process > A [100%] 1 of 1, stored: 1 ✔
Staging foreign file: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/__genome_libs_StarFv1.10/Mouse_GRCm39_M31_CTAT_lib_Nov092022.source.tar.gz
[skipping] Stored process > A

This file is only 1.9 Gb which is big enough to notice a delay but it's not very long in the grand scheme of things. But in our pipeline we have a 30 Gb input reference file which takes anywhere from 4-22 minutes to download depending on IO speeds. Is there any way to skip staging? Seems like an unnecessary step in this context.

Environment

  • Nextflow version: 23.10.1
  • Java version: 11
  • Operating system: Linux
  • Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions