Releases: ropensci/targets
Releases · ropensci/targets
Minor fixes
targets 1.11.4
- Tone down progress bar output for medium-overhead scenarios.
- Speed up
tar_meta()default settings fortar_read()etc. (for million-target pipelines). - Choose the
"terse"reporter by default if the calling session is non-interactive. This will hopefully avoid problems on CRAN for packages that usetargetswith the default settings. - Improve reporter deprecation messages (#1493, @dakvid).
- Clarify scope of
tar_renv()(#1506, @valentingar). - Handle errors in
rstudioapi::isAvailable()(#1519, @dipterix). - Improve error message when metadata file is corrupted (#1523, @dakvid).
Initialization speed and pre-processing progress messages
targets 1.11.3
Bug fixes
- Use
qmethod = "escape"to avoid Rdatatable/data.table#3509 (#1480, @koefoeden). - Ensure
error = "trim"does not hang when the errored target has a long chain of reverse dependencies (#1481, @koefoeden). - Manually remove class
"rlib_error_package_not_found"from errors (#1484, @malcolmbarrett). This and #1354 are unfortunate consequences of #997.
Other changes
- Call
suppressPackageStartupMessages()once for the whole pipeline. Repeated target-specific calls may be slow, and the messages themselves are cumbersome. This is an appropriate tradeoff. - Ensure the progress bar from the balanced reporter does not chop up messages from
tar_debug_instructions(). - Remove ANSI escape sequences from warnings and error messages.
- Use
cli::cli_text()instead ofcli::cli_progress_output()(#1478, @dipterix). - Minor speedups in the beginning and end of
tar_make()(#1482). - Cache
_targets/objects/time stamps only for local builders mentioned in the metadata, as opposed to everything in that directory (#1482). - Instrument pre-processing overhead with progress bars (#1482).
Bug fixes
Terse reporter and bugfix
targets 1.11.1
- Bugfix:
rstudio_available()returnsFALSEwithout error ifrstudioapiis not installed. - Add a new
"terse"reporter, which is the"balanced"reporter without the progress bar. Make"terse"the default reporter
Improved speed, default settings, and aesthetics
targets 1.11.0
Deprecated features
- Deprecate the
priorityargument oftar_target(). Because of #1458, custom priorities no longer have an effect on execution order. However, up-to-date parallelized pipelines with 100000+ targets can now be checked around 10 times faster, so the tradeoff is worth it.
Changes to default behavior
- Keep
format = "file"files on disk even for non-local repositories (#1467).
Changes to default settings
- In
tar_option_get(), setrepository_metato"local"by default, regardless ofrepository(#1427). - In
tar_option_get(), setstorage = "worker",retrieval = "auto", andmemory = "auto"by default (#1426). Formemory,"auto"is now equivalent to"transient"most of the time, but it is equivalent to"persistent"for non-dynamic targets that other targets dynamically branch over. Forretrieval, the"auto"setting is new. It is equivalent to"worker"for most cases, but it aligns with"main"for dynamic branches that branch over non-dynamic targets. All this is to avoid re-reading the upstream target from disk every time a branch needs to run. - Set the new "balanced" reporter to be the default reporter for
tar_make()andtar_outdated(). - Set the default
garbage_collectionargument oftar_option_get()to 0 (#1464).
Efficiency improvements
- Speed up checking up-to-date targets in large dynamic branching pipelines (#1458, #1460). The speedup is over 10-fold or more in some cases.
- Maintain a persistent text connection when appending to a metadata text file (#1415).
- Avoid superfluous garbage collection when
crewcontrollers are saturated. - Set defaults for
storage,retrieval, andmemorythat balance resource tradeoffs for the most common pipelines (#1426). - Garbage collection only runs in
targets:::target_run()(#1464). There is no longer a separategc()call on the main process. - Shave off overhead from
store_sync_file_meta()in the general case.
Other changes
- Upload workspaces to the cloud if
tar_option_get("repository_meta")is"aws"or"gcp". Download them withtar_workspace_download()and delete them withtar_destroy(destroy = "all")ortar_destroy(destroy = "cloud"). - Deep-copy settings when resolving
format = "auto"(#1425, @paulseamer). - Add
store_read_path.tar_auto()(#1429, @paulseamer). - Improve error message that explains
iteration = "group"branching problems. - Allow more special characters in recorded warnings and error messages.
- Call
cli::style_reset()at the end of non-silent reporters (#1450, @r2evans). - Exclude lists of target definitions from the globals in the dependency graph (#1431).
- Nomenclature change: drop the term "dynamic file" in favor of "file target".
- Internally choose a default level separation in the
visNetworkgraph based on the number of hierarchical levels and the maximum number of vertices per level (#1432). - In
tar_visnetwork(), choose the colors of the edges based on the origin vertices, not the destination vertices (#1433). - In the
"verbose"and"timestamp"reporters, print "dispatched pattern" messages, and print the total computation and storage summed over all the branches. - Create a new
"balanced"reporter with acliprogress bar (#1442). - Deprecate reporters
"forecast","forecast_interactive","verbose_positives", and"timestamp_positives"(#1442). - Ensure colors printed to the console are preserved when forwarded from the
callrprocess (#1442). - Add
tar_option_with()(ropensci/tarchetypes#215, @noamross). - Use
prettyunitsto print elapsed times and file sizes. - Shorten and simplify the
tar_make()error message. - Minor bugfix: add a new
on_workerargument totarget_run()andbuilder_unload_value()so the latter only removes the target value if the target was actually run on a worker.
Migrate to {crew} 1.0.0
targets 1.10.1
- Restore explicit references to "self" in
R6classes. - Perform
crewtask retries. - Try to handle
NAbuckets instore_delete_objects.tar_aws()andstore_delete_objects.tar_gcp().
Speed gains for large pipelines (with many up-to-date targets)
targets 1.10.0
Invalidating changes
These changes invalidate certain targets in a pipeline and cause them to rerun on the next tar_make().
- Exclude function signatures from
tar_repository_cas()output strings to reduce the size of pipeline metadata (#1390). - Exclude function signatures from
tar_format()output strings to reduce the size of pipeline metadata (#1390).
Summary of performance gains
tar_make() and tar_outdated() run much faster in this release. Extensive profiling was done on a real-world simulation pipeline with 66002 up-to-date targets. For tar_make() using all the default settings:
| Machine | Before (seconds) | After (seconds) | Speedup |
|---|---|---|---|
| M2 Macbook | 413.16 | 35.538 | 11.62587 |
| RHEL9 | 450.66 | 94.08 | 4.790 |
And for tar_outdated() using all the default settings
| Machine | Before (seconds) | After (seconds) | Speedup |
|---|---|---|---|
| M2 Macbook | 91.314 | 16.636 | 5.48894 |
| RHEL9 | 167.809 | 37.395 | 4.487472 |
To take advantage of these speed gains for an existing pipeline, you may have to run tar_make() to convert the time stamps and file sizes to a new format. This initial tar_make() is slow, but subsequent tar_make() calls should be much faster than before the upgrade.
Other/specific changes
- Speed up
tar_make()andtar_outdated()by avoiding excessive buffering and disk writes for metadata and reporters when the pipeline is just skipping targets. - Use a more lookup-efficient data structure for
tar_runtime$file_info(#1398). - Fall back on vector aggregation without names (#1401, @guglicap).
- Speed up representation of file sizes in metadata (#1408).
- Add a new
"forecast_interactive"reporter totar_outdated()to choose"forecast"for interactive sessions and"silent"for non-interactive ones. - Add a new
seconds_reporter_outdatedargument totar_config_set()with a default of 1 to control the time interval of the reporter oftar_outdated()and other passive algorithm functions. - Remove target descriptions from the default labels of graph visualizations.
igraph compatibility
targets 1.9.1
Bug fixes
- Allow branch references to contain multi-element
pathvectors with cloud metadata (#1382, @n8layman). - Avoid partial matches in internal code (#1384, @olivroy).
- Add error handling around calls to
ps::ps_disk_partitions()andps::ps_fs_mount_point(). - Do not store
_targets/objects/paths in metadata for CAS repositories (#1391).
Compatibility
- Ensure compatibility with
igraph>= 2.1.2.
Memory efficiency
targets 1.9.0
Improvements
- Un-break workflows that use
format = "file_fast"(#1339, @koefoeden). - Fix deadlock in
error = "trim"(#1340, @koefoeden). - Remove tailored debugging message (#1341, @koefoeden).
- Store warnings while writing to storage (#1345, @Aariq).
- Allow
garbage_collectionto be a non-negative integer to control the frequency of garbage collection in a performant, convenient, unified way (#1351). - Deprecate the
garbage_collectionargument oftar_make(),tar_make_future(), andtar_make_clusterm()(#1351). - Instrument
target_run(),target_prepare(), andtarget_conclude()usingautometric. - Avoid sending problematic error classes such as
"vctrs_error_subscript_oob"torlang::abort()(#1354, @Jiefei-Wang). - Reduce memory consumption by ~23% in large pipelines by avoiding the accumulation of promise objects (#1352).
- Avoid
store_assert_format()andstore_convert_object()isstorageis"none". - Add a
list()method totar_repository_cas()to make it easier and more efficient to specify custom CAS repositories (#1366). - Improve speed and reduce memory consumption by avoiding deep copies of inner environments of target definition objects (#1368).
- Reduce memory consumption by storing buds and branches as lightweight references when
memoryis"transient"(#1364). - Replace the
memoryclass with the newlookupclass. - Implement
memory = "auto"to select transient memory for dynamic branches and persistent memory for other targets (#1371). - Omit whole pattern targets from branch subpipelines when possible. Should reduce memory consumption in some cases.
- Omit whole stem targets from branch subpipelines when
retrievalis"main"and only a bud is actually used. The same cannot be done with branches because each branch may need to be (un)marshaled individually. - Compress branches into references when
retrievalis"worker"and the whole pattern is part of the subpipeline. - Avoid duplicated branch aggregation: just send the branches over the network.
- Back-compatibly switch
format = "qs"fromqstoqs2(#1373). - Add
tar_unblock_process().
Potentially invalidating changes
- Add
"keepNA"and"keepInteger"to.deparseOpts()(#1375). This may cause existing pipelines to rerun, but it makes add-ons liketarchetypes::tar_map()much easier to use.
Content addressable storage
targets 1.8.0
- Wrap
tar_watch()UI module inbslib::page()(#1302, @kwbyron-lilly). - Remove
callr_functionintar_make_as_job()argument list. - Ensure
storage = "worker"is respected when the process of storing an object generates an error (#1304, @multimeric). - Default to the
_targets.Rpattern intar_branches()(#1306, @multimeric, @mattwarkentin). - Remove superfluous functions and globals from metadata with
tar_prune()(#1312, @benzipperer). - Change the default
workspace_on_erroroption toTRUE(#1310, @hadley). - Enhance and organize the
error = "stop"error message. - Avoid saving a file in
_targets/objectsforerror = "null". Instead, switch to a special"null"storage format class iferroris"null"the target throws an error. This should allow users to more freely create new formats withtar_format()without worrying about how to handleNULLobjects created byerror = "null". - Implement
format = "auto"(#1311, @hadley). - Replace
pingrdependency withbase::socketConnection()for local URL utilities (#1317, #1318, @Adafede). - Implement
tar_repository_cas(),tar_repository_cas_local(), andtar_repository_cas_local_gc()for content-addressable storage (#1232, #1314, @noamross). - Add
tar_format_get()to make implementing CAS systems easier. - Implement
error = "trim"intar_target()andtar_option_set()(#1310, #1311, @hadley). - Use the file system type to decide whether to trust time stamps (#1315, @hadley, @gaborcsardi).
- Deprecate
format = "file_fast"in favor of the above (#1315). - Deprecate
trust_object_timestampsin favor of the more unifiedtrust_timestampsintar_option_set()(#1315). - Print storage size of each target in verbose reporters (#1337, @psychelzh).
- Combine help files of
tar_target()andtar_target_raw(). Same withtar_load()andtar_load_raw(). - Add a
substituteargument totar_format()to make it easier to write custom storage formats without metaprogramming.