Skip to content

Commit a0c2f25

Browse files
authored
feat: workflow and config templates (#29)
1 parent 517da65 commit a0c2f25

File tree

6 files changed

+215
-5
lines changed

6 files changed

+215
-5
lines changed

README.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,30 @@ Install [`cookiecutter`](https://cookiecutter.readthedocs.io/en/stable/) and run
2121
You will have a Git-initialized Snakemake project at the following location:
2222

2323
```console
24-
> tree -a myworkflow
24+
tree -a myworkflow
2525
myworkflow
2626
├── .git # contents omitted for brevity
2727
├── .github
2828
│   └── CODEOWNERS
2929
├── .gitignore
3030
├── README.md
31+
├── Snakefile
3132
├── config
3233
│   ├── config.yml
3334
│   └── config_schema.yml
34-
├── environment.yml
35-
└── workflow
36-
└── myworkflow.smk
35+
└── environment.yml
3736
```
37+
38+
## Development
39+
40+
Read the [Snakemake Best Practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) and the [Fulcrum Snakemake](https://www.notion.so/fulcrumgenomics/Snakemake-3d836708c9bc47ca868ee9a09ada7d0d) documentation.
41+
42+
### Configuration
43+
44+
[Snakemake supports configuration and validation of workflow parameters](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html).
45+
46+
To ensure valid inputs to your workflow, use a [configuration schema](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation).
47+
48+
This template includes an example [configuration schema]({{cookiecutter.project_slug}}/config/config_schema.yml) and [configuration file]({{cookiecutter.project_slug}}/config/config.yml) to get you started.
49+
50+
At runtime, the [workflow]({{cookiecutter.project_slug}}/Snakefile) [validates the provided configuration](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#validation) against the defined JSON schema.

{{cookiecutter.project_slug}}/README.md

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,49 @@
1-
# Snakemake workflow with Python toolkit
1+
# {{cookiecutter.project_slug}}
2+
3+
{{cookiecutter.project_short_description}}
4+
5+
## Inputs
6+
7+
> [!WARNING]
8+
> **After creating a new project, describe workflow input file formats here. **
9+
>
10+
> Use the [configuration schema](config/config_schema.yml) for simple input descriptions. Include URLs to input descriptions for 3rd party tools.
11+
12+
- `reference`: URL to the reference genome FASTA to be downloaded.
13+
- `experiment`: Name of the experiment.
14+
- `samples`: List of sample identifiers.
15+
- `p_value_cutoff`: P-value cut-off for statistical significance.
16+
17+
> [!WARNING]
18+
> Consider using [Markdown tables](https://www.tablesgenerator.com/markdown_tables) to describe the fields of custom TSV/CSV input files, e.g.
19+
20+
### Samplesheet
21+
22+
A TSV file with fields:
23+
24+
| Field | Type | Description |
25+
|-------------------------|-----------------------|-----------------------------------------------------------------|
26+
| `sample_id` | String, no whitespace | Sample identifier |
27+
| `condition` | String, no whitespace | Abbreviated name for experimental condition, e.g. "neg_control" |
28+
| `condition_description` | String | Long description of experimental condition |
29+
| `fastq_r1` | Absolute path | Path to R1 FASTQ for sample |
30+
| `fastq_r2` | Absolute path | Path to R2 FASTQ for sample |
31+
32+
## Outputs
33+
34+
> [!WARNING]
35+
> **After creating a new project, describe workflow outputs here.**
36+
>
37+
> Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats.
38+
39+
```console
40+
results
41+
├── plots
42+
│   └── {experiment}.heatmap.png # Heatmap describing counts for all samples
43+
└── counts
44+
   ├── {sample_name}.counts.tsv # Raw counts
45+
   └── {sample_name}.counts.summary.tsv # Summary of counts
46+
```
247

348
## Set up Environment
449

@@ -12,3 +57,17 @@ To install and activate:
1257
mamba env create -f environment.yml
1358
mamba activate {{cookiecutter.project_slug}}
1459
```
60+
61+
## Configure and run the workflow
62+
63+
The [workflow configuration schema](config/config_schema.yml) describes the parameters for the workflow, and the [config file](config/config.yml) contains the parameter values.
64+
65+
```console
66+
snakemake -j12
67+
```
68+
69+
You can override specific values on the command line with the `--config` parameter.
70+
71+
```console
72+
snakemake -j12 --config experiment=myexperiment
73+
```
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
################################################################################
2+
# Pipeline for {{cookiecutter.project_slug}}
3+
################################################################################
4+
5+
from snakemake.utils import validate
6+
7+
8+
################################################################################
9+
# Utility methods and variables
10+
################################################################################
11+
12+
13+
configfile: workflow.basedir + "/config/config.yml"
14+
15+
16+
validate(config, workflow.basedir + "/config/config_schema.yml")
17+
18+
19+
################################################################################
20+
# Snakemake rules
21+
################################################################################
22+
23+
EXPERIMENT = config["experiment"]
24+
SAMPLES = config["samples"]
25+
REFERENCE_URL = config["reference"]
26+
READS = ["1", "2"]
27+
BWA_INDEX_EXTS = [".amb", ".ann", ".bwt", ".pac", ".sa"]
28+
29+
30+
rule all:
31+
input:
32+
multiext("data/resources/ref.fa", *BWA_INDEX_EXTS),
33+
expand("data/raw/{EXPERIMENT}/{sample}_R{read}.fastq.gz", sample=SAMPLES, read=READS),
34+
35+
36+
rule download_raw_data:
37+
"""
38+
Downloads the raw reads for each sample.
39+
40+
Output:
41+
reads: A gzip-compressed FASTQ file.
42+
"""
43+
output:
44+
reads="data/raw/{EXPERIMENT}/{sample}_R{read}.fastq.gz",
45+
log:
46+
"logs/download_raw_data.{sample}.R{read}.log",
47+
shell:
48+
"""
49+
(
50+
# wget -O {output.reads} https://to/data/{wildcards.sample}_R{wildcards.read}.fastq.gz
51+
touch {output.reads}
52+
) &> {log}
53+
"""
54+
55+
56+
rule index_reference_genome:
57+
"""
58+
Runs bwa indexing on the reference genome.
59+
60+
Input:
61+
ref: The reference genome in FASTA format.
62+
63+
Output:
64+
indexes: The BWA index files for the reference genome.
65+
"""
66+
input:
67+
ref="data/resources/ref.fa",
68+
output:
69+
indexes=multiext("data/resources/ref.fa", *BWA_INDEX_EXTS),
70+
log:
71+
"logs/index_reference_genome.log",
72+
shell:
73+
"""
74+
(
75+
# bwa index {input.ref}
76+
touch {output.indexes}
77+
) &> {log}
78+
"""
79+
80+
81+
rule download_reference_genome:
82+
"""
83+
Downloads the reference genome FASTA file.
84+
85+
Output:
86+
ref: The reference genome in FASTA format.
87+
"""
88+
params:
89+
ref_url=REFERENCE_URL,
90+
output:
91+
ref="data/resources/ref.fa",
92+
log:
93+
"logs/download_reference_genome.log",
94+
shell:
95+
"""
96+
(
97+
# wget -O {output.ref}.gz {params.ref_url}
98+
# gunzip {output.ref}.gz
99+
touch {output.ref}
100+
) &> {log}
101+
"""
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
################################################################################
2+
# Configuration file for {{cookiecutter.project_slug}}
3+
################################################################################
4+
5+
experiment: "experiment1"
6+
samples: ["sample1", "sample2"]
7+
reference: https://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
$schema: "https://json-schema.org/draft/2020-12/schema"
2+
description: Config schema for {{cookiecutter.project_slug}}
3+
4+
type: object
5+
6+
properties:
7+
experiment:
8+
type: string
9+
description: Name of the experiment.
10+
example: "{{cookiecutter.project_slug}}"
11+
12+
reference:
13+
type: string
14+
description: URL to the reference genome FASTA.
15+
example: https://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
16+
17+
samples:
18+
type: array
19+
description: List of sample identifiers.
20+
example: ["sample1", "sample2"]
21+
22+
p_value_cutoff:
23+
type: number
24+
description: P-value cutoff for statistical significance.
25+
default: 0.05
26+
27+
required:
28+
- experiment
29+
- samples
30+
- reference

{{cookiecutter.project_slug}}/workflow/{{cookiecutter.project_slug}}.smk

Whitespace-only changes.

0 commit comments

Comments
 (0)