-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: workflow and config templates #29
base: am_feat_template_update
Are you sure you want to change the base?
Changes from 3 commits
6e5e4d1
c84ef78
412660b
10a0725
1c5896c
f491ad6
aa04242
8563056
6d87fa8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -28,10 +28,74 @@ myworkflow | |||||
│ └── CODEOWNERS | ||||||
├── .gitignore | ||||||
├── README.md | ||||||
├── Snakefile | ||||||
├── config | ||||||
│ ├── config.yml | ||||||
│ └── config_schema.yml | ||||||
├── environment.yml | ||||||
└── workflow | ||||||
└── myworkflow.smk | ||||||
└── environment.yml | ||||||
``` | ||||||
|
||||||
## Development | ||||||
|
||||||
Read the [Snakemake Best Practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) and the [Fulcrum Snakemake](https://www.notion.so/fulcrumgenomics/Snakemake-3d836708c9bc47ca868ee9a09ada7d0d) documentation. The text below is adapted from the latter. | ||||||
|
||||||
If there is a single Snakemake workflow, it should be named `Snakefile` and kept at the top level of the repository. If there are multiple workflows, name them according to their function and give them the extension `.smk`. | ||||||
|
||||||
Workflow files should mainly contain `rules`. Any additional code should be added to a separate `Python` toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Requirements (e.g. executables) should not be specified in the workflows, but should be maintained as part of an environment (e.g. via [Mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)). | ||||||
|
||||||
The following should be followed: | ||||||
|
||||||
1. All rules should have descriptive names | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. | ||||||
ameynert marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
3. All rules should have the following directives | ||||||
- `input`: the input paths | ||||||
- `output`: the output paths | ||||||
- `log`: the path to the log file. Good practice is to include the workflow name, rule name, and any wildcards in the log file name, e.g. `logs/{workflow_name}.{rule_name}.{wildcard1}.{wildcard2}.log`. | ||||||
ameynert marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
4. The following directives are optional, but recommended when known: | ||||||
- `params`: any custom metadata, both static and conditional on the wildcards | ||||||
- `threads`: the number of threads to use | ||||||
- `resources`: specifies custom resources with the following keywords: | ||||||
- `mem_gb`: the amount of memory to allocate (in gigabytes, ex. `8` for eight gigabytes) | ||||||
5. The parameters to the `input`, `params`, and `output` directives should have keywords | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```python | ||||||
rule: | ||||||
input: | ||||||
bam='/path/to/bam' | ||||||
``` | ||||||
|
||||||
6. Shell commands should not contain references to global variables, but only rule directives. Data needed to build the command should be stored in the `params` data structure. | ||||||
7. Both standard input and standard output should be piped to the log file. | ||||||
msto marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```python | ||||||
rule: | ||||||
... | ||||||
shell: | ||||||
""" | ||||||
( | ||||||
echo "Hello Zorld" | sed -e 's_Z_W_' | ||||||
) &> {log} | ||||||
""" | ||||||
``` | ||||||
|
||||||
8. Use `ALL_CAPS` for naming global variables. This will help distinguish them from local variables and parameters when used in input functions. | ||||||
9. Simplify inputs with the use of [Snakemake helper functions](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#helpers-for-defining-rules) e.g. `expand` and `multiext`. | ||||||
10. Use [input functions](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#input-functions) where appropriate. | ||||||
11. Be consistent with your file separator character. | ||||||
|
||||||
### Configuration | ||||||
|
||||||
[Snakemake configuration documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
To ensure valid inputs to your workflow, use a [configuration schema](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation). | ||||||
|
||||||
This template includes an example [configuration schema](config/config_schema.yml) and [configuration file](config/config.yml) to get you started. | ||||||
The [workflow](Snakefile) looks for the configuration schema to validate the configuration file: | ||||||
|
||||||
```python | ||||||
from snakemake.utils import validate | ||||||
|
||||||
validate(config, workflow.basedir + "/config/config_schema.yml") | ||||||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,20 @@ To install and activate: | |
mamba env create -f environment.yml | ||
mamba activate {{cookiecutter.project_slug}} | ||
``` | ||
|
||
## Configure and run the workflow | ||
|
||
The [workflow configuration schema](config/config_schema.yml) describes the parameters for the workflow. | ||
To set the parameters for a specific run of the workflow, write a [configuration file](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html) using the schema and the [example config/config.yml](config/config.yml) as a guide. | ||
|
||
If you do not specify a workflow file with e.g. `-s myworkflow.smk`, Snakemake will look, in this order, for a file called `Snakefile`, `snakefile`, `workflow/Snakefile`, or `workflow/snakefile`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much of the Snakemake basics do we think are necessary to include in this template? Some of this feels more appropriate as links to the relevant sections of the Snakemake docs, or assumed as prior knowledge for the user |
||
|
||
```console | ||
snakemake -j12 --configfile config/config.yml | ||
``` | ||
|
||
You can override specific values on the command line with the `--config` parameter. | ||
|
||
```console | ||
snakemake -j12 --configfile config/config.yml --config experiment=myexperiment | ||
``` |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,50 @@ | ||||
################################################################################ | ||||
# Pipeline for {{cookeicutter.project_slug}} | ||||
ameynert marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
################################################################################ | ||||
|
||||
from fgsmk.log import on_error | ||||
from snakemake.utils import validate | ||||
|
||||
|
||||
################################################################################ | ||||
# Utility methods and variables | ||||
################################################################################ | ||||
|
||||
validate(config, workflow.basedir + "/config/config_schema.yml") | ||||
|
||||
onerror: | ||||
on_error(snakefile=Path(__file__), config=config, log=Path(log)) | ||||
"""Block of code that gets called if the snakemake pipeline exits with an error.""" | ||||
|
||||
################################################################################ | ||||
# Snakemake rules | ||||
################################################################################ | ||||
|
||||
SAMPLES = config["samples"] | ||||
READS = ["1", "2"] | ||||
|
||||
|
||||
rule all: | ||||
input: | ||||
"data/resources/ref.fa", | ||||
expand("data/raw/{sample}_R{read}.fastq.gz", sample=SAMPLES, read=READS), | ||||
|
||||
|
||||
rule download_raw_data: | ||||
output: | ||||
"data/raw/{sample}_R{read}.fastq.gz", | ||||
shell: | ||||
""" | ||||
# wget -O {output} https://to/data/{wildcards.sample}_R{wildcards.read}.fastq.gz | ||||
touch {output} | ||||
""" | ||||
|
||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||
|
||||
rule download_resource_data: | ||||
output: | ||||
"data/resources/ref.fa", | ||||
shell: | ||||
""" | ||||
# wget -O {output} https://to/data/ref.fa | ||||
touch {output} | ||||
""" | ||||
ameynert marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
################################################################################ | ||
# Configuration file for {{cookeicutter.project_slug}} | ||
ameynert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
################################################################################ | ||
|
||
experiment: "experiment1" | ||
samples: ["sample1", "sample2"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
$schema: "https://json-schema.org/draft/2020-12/schema" | ||
description: Config schema for {{cookiecutter.project_slug}} | ||
|
||
type: object | ||
|
||
properties: | ||
experiment: | ||
type: string | ||
description: Name of the experiment. | ||
example: "{{cookiecutter.project_slug}}" | ||
|
||
samples: | ||
type: array | ||
description: List of samples. | ||
example: ["sample1", "sample2"] | ||
|
||
p_value_cutoff: | ||
type: number | ||
description: P-value cutoff for statistical significance. | ||
default: 0.05 | ||
|
||
required: | ||
- experiment | ||
- samples | ||
Comment on lines
+6
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think we should update the workflow to use these config params? Or, choose contrived example params that could be used in the workflow? |
msto marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this text should be used to update the Notion page instead? The template is only intended for internal use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine moving most/all of the below text to the Notion page and linking from here. (Just use suggest changes so N/T/C can review.) That way, we're not maintaining it in two locations.
And I find it easier to make edits and suggestions to longform text in Notion or gdocs rather than a PR