-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: workflow and config templates #29
base: am_feat_template_update
Are you sure you want to change the base?
feat: workflow and config templates #29
Conversation
## Development | ||
|
||
Read the [Snakemake Best Practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) and the [Fulcrum Snakemake](https://www.notion.so/fulcrumgenomics/Snakemake-3d836708c9bc47ca868ee9a09ada7d0d) documentation. The text below is adapted from the latter. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this text should be used to update the Notion page instead? The template is only intended for internal use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine moving most/all of the below text to the Notion page and linking from here. (Just use suggest changes so N/T/C can review.) That way, we're not maintaining it in two locations.
And I find it easier to make edits and suggestions to longform text in Notion or gdocs rather than a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really nice! The new docs and schemas are awesome
{{cookiecutter.project_slug}}/workflow/{{cookiecutter.project_slug}}.smk
Outdated
Show resolved
Hide resolved
properties: | ||
experiment: | ||
type: string | ||
description: Name of the experiment. | ||
example: "{{cookiecutter.project_slug}}" | ||
|
||
samples: | ||
type: array | ||
description: List of samples. | ||
example: ["sample1", "sample2"] | ||
|
||
p_value_cutoff: | ||
type: number | ||
description: P-value cutoff for statistical significance. | ||
default: 0.05 | ||
|
||
required: | ||
- experiment | ||
- samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should update the workflow to use these config params? Or, choose contrived example params that could be used in the workflow?
|
||
If there is a single Snakemake workflow, it should be named `Snakefile` and kept at the top level of the repository. If there are multiple workflows, name them according to their function and give them the extension `.smk`. | ||
|
||
Workflow files should mainly contain `rules`. Any additional code should be added to a separate `Python` toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow files should mainly contain `rules`. Any additional code should be added to a separate `Python` toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data. | |
Workflow files should mainly contain `rules`. Any additional code should be added to a separate Python toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data. |
|
||
The following should be followed: | ||
|
||
1. All rules should have descriptive names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. All rules should have descriptive names | |
1. All rules should have descriptive names. |
The following should be followed: | ||
|
||
1. All rules should have descriptive names | ||
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g. | |
2. Each rule should have a short docstring describing its behavior and any tools it uses. The docstring should also describe the required fieldnames or schema of any custom input or output file formats. | |
Follow Python's docstring conventions, e.g.: |
Inputs: | ||
file: TSV file with fields name, count, and source describing counts of ... | ||
Params: | ||
min_count: Minimum count for output. | ||
Output: | ||
file: Filtered version of input.file containing only rows with count > params.min_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inputs: | |
file: TSV file with fields name, count, and source describing counts of ... | |
Params: | |
min_count: Minimum count for output. | |
Output: | |
file: Filtered version of input.file containing only rows with count > params.min_count | |
Inputs: | |
file: TSV file with fields "name", "count", and "source" describing counts of ... | |
Params: | |
min_count: Minimum count for output. | |
Output: | |
file: Filtered version of `input.file` containing only rows where `count > params.min_count`. |
1. All rules should have descriptive names | ||
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indent all code blocks in this section so they are appropriately aligned within their bullet item when rendered to markdown?
|
||
## Inputs | ||
|
||
TODO: Describe workflow input file formats here. Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Describe workflow input file formats here. Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools. | |
> [!WARNING] | |
> **After creating a new project, describe workflow input file formats here. ** | |
> | |
> Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools. |
|
||
## Outputs | ||
|
||
TODO: Describe workflow outputs here. Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Describe workflow outputs here. Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats. | |
> [!WARNING] | |
> **After creating a new project, describe workflow outputs here.** | |
> | |
> Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats. |
The [workflow configuration schema](config/config_schema.yml) describes the parameters for the workflow. | ||
To set the parameters for a specific run of the workflow, write a [configuration file](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html) using the schema and the [example config/config.yml](config/config.yml) as a guide. | ||
|
||
If you do not specify a workflow file with e.g. `-s myworkflow.smk`, Snakemake will look, in this order, for a file called `Snakefile`, `snakefile`, `workflow/Snakefile`, or `workflow/snakefile`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of the Snakemake basics do we think are necessary to include in this template?
Some of this feels more appropriate as links to the relevant sections of the Snakemake docs, or assumed as prior knowledge for the user
touch {output.indexes} | ||
) &> {log} | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
touch {output.reads} | ||
) &> {log} | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two lines between rules per snakefmt
config.yml
,config_schema.yml
, andSnakefile
with "hello world" type examples.README.md
README.md
with examples of how to document inputs and outputs, and how to configure and run the workflow