Tasks

Tasks are a set of key-value properties called directives. The task directives determine when and how the task will be executed. But, some directives will be used differently depending on which backend you're running with.

Any additional directives can be added as well, they will not affect the runner or execution of tasks. These will be stored in the workflow and could be useful for downstream meta-analysis of the projects themselves (eg. performance comparisons, or documenting the tasks)

Identity

The identity of a task is a hash of cmd, exec, container, and digest directives. This identity is used to compare tasks against the saved progress records. If the task identity has changed, the task will need to be reset along with any downstream tasks.

Identity is a good indicator that tasks have changed and will need to be re-run, but it's possible to make changes to a template that go undetected. Understanding task identity will prevent you from making common mistakes:

Making changes to external scripts or utilities called by the task cmd without any changes to the task. Since the cmd has not changed, the identity will not change, and Jetstream will not know that the task needs to be rerun. Avoid this problem by tracking version information to the cmd and updating if any change to the external script changes.
Changing other task directives will only force a task to be reset if it has already failed. If the task completed successfully, or has not been run yet, the task content will not be updated.

Directives

Here is a list of the common task directives and how they're used:

"Core" directives

name

A unique identifier for the task. If this is absent, it will be assigned based on a hash of the task content. Naming tasks allows you to link dependencies to this task by name. It's also used to determine the filename for logs saved in projects.
cmd

Command to be executed by the backend, should be valid Bash. This is where the main "work" of the task should exist.

"Flow" directives

Flow directives determine the order in which the tasks will be executed:

after

This task will run after tasks named with each value. Supports sequences.
after-re

This task will run after tasks matching each given regex pattern. Supports sequences.
before

This task will run before tasks named with each value. Supports sequences.
before-re

This task will run before tasks matching each given regex pattern. Supports sequences.
input:

This task will run after tasks with output directives matching each given value. Supports sequences.
input-re:

This task will run after tasks with output directives matching each given regex pattern. Supports sequences.
output:

This task will satisfy a matching input value requirement. Supports sequences.

"Execution" directives

exec:

Python code that will execute in the runner process immediately before sending the task to the backend. This feature can be used to modify the workflow (add tasks) while it's being run. Two local variables are added during execution: task and runner. The runner is important because it contains the current workflow. Any errors during execution will halt the runner immediately.

Note: the workflow graph will always be recalculated after any exec directive runs, and most work can be done with cmd directives.
cpus:

LocalBackend - Will reserve local cpus when launching cmd SlurmBackend - Passed as "-c" when requesting the job allocation
mem:

SlurmBackend - Passed as "--mem" when requesting job allocation
reset:

When this task is reset, it will also reset other tasks in the workflow. Typically when a task is reset, any descendants will also be reset. This directive allows additional tasks to be reset. Options for this directive are: task name(s) or the special value "predecessors" which will reset any immediate upstream tasks.
stdin:

cmd stdin will be connected to this value
stdout:

cmd stdout will be connected to this value
stderr:

cmd stderr will be connected to this value

Any additional directives can be added as well, they will not affect the runner or execution of tasks. These will be stored in the workflow and could be useful for downstream meta-analysis of the projects themselves (eg. performance comparisons, or documenting the tasks)

Here are some ideas that we've used before:

tags:

Sequence of short descriptive tags that can be used for categorizing tasks for downstream analysis.
methods:

Describe what this task does in plain language, later this can be used to generate a methods section for a project.
description:

Describe this task for potential users

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

tasks.md

tasks.md

Tasks

Identity

Directives

Files

tasks.md

Latest commit

History

tasks.md

File metadata and controls

Tasks

Identity

Directives