Snakemake workflows that implement the default interface of the CUBI can always be deployed in the same way. The following steps outline the recommended process:
- create a directory for the project (
project_dir/
) - clone the workflow repository into that directory, and
checkout the version of the workflow you want to run
(if applicable)
project_dir/ | --- workflow_repository/ | --- init.py
- run the
init.py
script from inside the workflow repository- attention: by default, the
init.py
script attempts to create a Conda environment containing all necessary tools to execute the workflow (that is essentially Snakemake plus a few dependencies). This part of the setup requires a working Conda installation including a proper configuration of Conda to make use of thebioconda
channel. - if you don't want to make use of Conda environments to run
the workflow, please make sure that all software dependencies
listed in the
exec_env.yaml
environment specification are available on your system.
- attention: by default, the
- the
init.py
script will create the following directoriesproject_dir/ | --- workflow_repository/ | | | --- init.py | --- wd/ # the working directory for Snakemake --- exec_env/ # the Conda execution environment
- activate the Conda execution environment:
conda activate ./exec_env
- if applicable, prepare the Snakemake profile for your compute infrastructure
- HHU-internal users: you can make use of a small utility maintained by the CUBI that automates creating Snakemake profiles for the compute infrastructure at HHU/UKD snakemake-utils@HHU GitLab
- prepare the sample data information (the so-called "sample sheet") as a plain text tab-separated table ("*.tsv"). Please refer to the specific workflow documentation to learn what data the respective workflow needs as input.
- run the workflow from inside the workflow repository as follows:
snakemake -n \ # start with a dry run -d ../wd/ \ # the working directory as created above --configfiles [...] \ # parameters for the workflow --config samples=PATH/TO/SAMPLE-SHEET.tsv \ # recommended: use an absolute path --profile PATH/TO/SNAKEMAKE-PROFILE/ \ run_all # create all result files specified in the workflow
- note that executing the workflow first in dry run mode is strongly recommended to check if the setup process worked as expected. Moreover, running the workflow twice in dry run mode is required to create the manifest output of the workflow.
- collect the important workflow output from the
results/
folder:project_dir/ | --- workflow_repository/ | | | --- init.py | --- wd/ | --- results/ --- exec_env/
- the
results/
folder also contains other important information besides the analysis output. Please refer to the description of the standard folder layout of the working directory to find more information.
- the