[help] up-to-date checking is slow for large file targets: is this normal/unavoidable? #1416
-
Help
DescriptionHi! First of all this package is fantastic and powerful and has made my projects safer/more reproducible as intended, so thank you 😊 I have a question: In my case, I'm running a functional MRI research study. Let's say right now I have 15 participants' data, and each participant has 5 raw MRI data files that are about 2 GB each. Each of these raw MRI files gets processed into another MRI file that's a similar size. I'm tracking each of these MRI files as its own target, and they're all set as Ultimately, my main analysis targets currently depend on 15 x 5 x 2 = 150 upstream data targets that are each 2 GB (as well as many other much smaller upstream targets). (This study is in progress, so the number of targets is slowly increasing as we collect more data!) When I change the code for a downstream analysis target, and all of the upstream MRI data targets are up-to-date, Is there anything I can do to speed up the updatedness checking, or is that a fact of life with target files this large? Thank you for your help! Potentially relevant settings and other details:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The overhead is definitely avoidable if you do one of the following:
|
Beta Was this translation helpful? Give feedback.
The overhead is definitely avoidable if you do one of the following:
targets
version 1.7.1, setformat = "file_fast"
where you currently haveformat = "file"
.targets
(1.10.0). You may have to run a slowtar_make()
at first, but afterward the pipeline should trust timestamps and run much faster if everything is up to date.