Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: workflow and config templates #29

Open
wants to merge 9 commits into
base: am_feat_template_update
Choose a base branch
from

Conversation

ameynert
Copy link

@ameynert ameynert commented Jan 30, 2025

  • Populates the config.yml, config_schema.yml, and Snakefile with "hello world" type examples.
  • Adds a "Development" section to the template repo README.md
  • Fleshes out the cookecutter README.md with examples of how to document inputs and outputs, and how to configure and run the workflow

@ameynert ameynert self-assigned this Jan 30, 2025
@ameynert ameynert mentioned this pull request Jan 30, 2025
7 tasks
## Development

Read the [Snakemake Best Practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html) and the [Fulcrum Snakemake](https://www.notion.so/fulcrumgenomics/Snakemake-3d836708c9bc47ca868ee9a09ada7d0d) documentation. The text below is adapted from the latter.

Copy link
Author

@ameynert ameynert Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this text should be used to update the Notion page instead? The template is only intended for internal use.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine moving most/all of the below text to the Notion page and linking from here. (Just use suggest changes so N/T/C can review.) That way, we're not maintaining it in two locations.

And I find it easier to make edits and suggestions to longform text in Notion or gdocs rather than a PR

@ameynert ameynert marked this pull request as ready for review January 31, 2025 23:38
@ameynert ameynert assigned msto and unassigned ameynert Jan 31, 2025
Copy link

@msto msto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really nice! The new docs and schemas are awesome

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
{{cookiecutter.project_slug}}/Snakefile Outdated Show resolved Hide resolved
{{cookiecutter.project_slug}}/config/config.yml Outdated Show resolved Hide resolved
{{cookiecutter.project_slug}}/Snakefile Outdated Show resolved Hide resolved
Comment on lines +6 to +24
properties:
experiment:
type: string
description: Name of the experiment.
example: "{{cookiecutter.project_slug}}"

samples:
type: array
description: List of samples.
example: ["sample1", "sample2"]

p_value_cutoff:
type: number
description: P-value cutoff for statistical significance.
default: 0.05

required:
- experiment
- samples
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should update the workflow to use these config params? Or, choose contrived example params that could be used in the workflow?


If there is a single Snakemake workflow, it should be named `Snakefile` and kept at the top level of the repository. If there are multiple workflows, name them according to their function and give them the extension `.smk`.

Workflow files should mainly contain `rules`. Any additional code should be added to a separate `Python` toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflow files should mainly contain `rules`. Any additional code should be added to a separate `Python` toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data.
Workflow files should mainly contain `rules`. Any additional code should be added to a separate Python toolkit, for example to parse the configuration object, handle samplesheet input, or to organize reference data.


The following should be followed:

1. All rules should have descriptive names
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. All rules should have descriptive names
1. All rules should have descriptive names.

The following should be followed:

1. All rules should have descriptive names
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g.
2. Each rule should have a short docstring describing its behavior and any tools it uses. The docstring should also describe the required fieldnames or schema of any custom input or output file formats.
Follow Python's docstring conventions, e.g.:

Comment on lines +58 to +63
Inputs:
file: TSV file with fields name, count, and source describing counts of ...
Params:
min_count: Minimum count for output.
Output:
file: Filtered version of input.file containing only rows with count > params.min_count
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Inputs:
file: TSV file with fields name, count, and source describing counts of ...
Params:
min_count: Minimum count for output.
Output:
file: Filtered version of input.file containing only rows with count > params.min_count
Inputs:
file: TSV file with fields "name", "count", and "source" describing counts of ...
Params:
min_count: Minimum count for output.
Output:
file: Filtered version of `input.file` containing only rows where `count > params.min_count`.

1. All rules should have descriptive names
2. All rules should have a short docstring describing what the rule does, and what tool(s) it uses. If there are any custom input or output file formats, describe them, e.g. for a CSV/TSV file list the expected field names. Follow the docstring conventions for Python, e.g.

```python
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indent all code blocks in this section so they are appropriately aligned within their bullet item when rendered to markdown?


## Inputs

TODO: Describe workflow input file formats here. Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TODO: Describe workflow input file formats here. Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools.
> [!WARNING]
> **After creating a new project, describe workflow input file formats here. **
>
> Use the [configuration schema](config/config_schema.yml) for simple input descriptions, e.g. `reference` is "Path to reference genome in FASTA format with .fai and .dict indexes". Include URLs to input descriptions for 3rd party tools.


## Outputs

TODO: Describe workflow outputs here. Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TODO: Describe workflow outputs here. Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats.
> [!WARNING]
> **After creating a new project, describe workflow outputs here.**
>
> Consider using a `tree` output style format to describe the expected output file structure, URLs to third party file format descriptions, and tables as in [Inputs](#inputs) for custom output file formats.

The [workflow configuration schema](config/config_schema.yml) describes the parameters for the workflow.
To set the parameters for a specific run of the workflow, write a [configuration file](https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html) using the schema and the [example config/config.yml](config/config.yml) as a guide.

If you do not specify a workflow file with e.g. `-s myworkflow.smk`, Snakemake will look, in this order, for a file called `Snakefile`, `snakefile`, `workflow/Snakefile`, or `workflow/snakefile`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the Snakemake basics do we think are necessary to include in this template?

Some of this feels more appropriate as links to the relevant sections of the Snakemake docs, or assumed as prior knowledge for the user

touch {output.indexes}
) &> {log}
"""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

touch {output.reads}
) &> {log}
"""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

two lines between rules per snakefmt

@msto msto assigned ameynert and unassigned msto Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants