Add pipeline AGENTS.md#2
Conversation
| | ├── script.py // all scripts must start with a shebang | ||
| | └── other_script.R | ||
| ├── CHANGELOG.md // changelog, should be updated after every substantial change | ||
| ├── CITATIONS.md // list of tool citations, should be updated when new tools are added |
There was a problem hiding this comment.
I do think this update should be made via nf-core tools, cf nf-core/tools#4328
There was a problem hiding this comment.
My AI-assisted review :-).
But it's derived from my own AI-dev practices.
Really solid first draft. The directory treemap and the terms / pipeline-structure / configuration sections are accurate and exactly the grounding an agent needs, and it is good to see modules.config already described as a single process block with withName selectors.
Most of my comments are about places where the behaviour the file prescribes would, in practice, make an agent burn CI or fight the branch model, plus a couple of module conventions reviewers reliably enforce. Themes, with specifics inline:
- The per-commit routine asks for a full
nf-testrun (and release lint) before every atomic commit. For an agent making several commits on a branch that is N full pipeline test runs where one before pushing would do. Better to tie the heavy checks to push / pre-PR and keep only the cheap static checks per-commit. - The fork branch policy contradicts the feature-branch rule one sentence earlier and would let an agent pile unrelated work onto its fork's
dev, which then cannot produce single-feature PRs. - Nothing about git worktrees. For autonomous agents that may run several sessions against one checkout, a one-line "use a worktree per task" near the branch policy would prevent a lot of clobbering.
- Snapshot guidance should warn that file hashes are architecture-sensitive, or an agent will regenerate them locally and break CI on a different arch.
ext.args/ext.prefixare the heart of module configuration but never appear; that is the convention reviewers most often ask for ("should args be exposed?").
Agree that the modules repo will need its own version (it is not template-generated and has different conventions), and that AGENTS.md updates are best applied through nf-core tools rather than hand edits.
| Before each commit, perform all of the following: | ||
| 1. Run `nextflow lint .` to lint all Nextflow scripts in the repository. Resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. | ||
| 2. Run `nf-core pipelines lint`, resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. If you are preparing a release (PR to main), use `nf-core pipelines lint --release` instead. | ||
| 3. Run `nf-test test tests/`. If the pipeline fails, resolve the underlying issues. If the test fails due to mismatching snapshots, update them with `nf-test test tests/ --update-snapshot` only if you expect the specific change in the output. Otherwise, fix the issue that caused the unexpected change. | ||
| 4. Run `prek` and stage all changes it generates. | ||
| After completing these steps, you are free to commit your changes. |
There was a problem hiding this comment.
Running the full pipeline nf-test suite (and release lint) before every atomic commit is very expensive and slow, and CI re-runs it on the PR anyway. For an agent that makes several commits per branch this multiplies cost for little signal. Suggest splitting: keep the cheap, fast static checks at per-commit (prek and nextflow lint ., which prek may already cover), and move nf-test test tests/ and nf-core pipelines lint --release to a pre-push / pre-PR step. That keeps commits cheap while still guaranteeing green before review.
Relatedly, since nf-core CI runs the full test matrix on every push, consider telling the agent to append [skip ci] to work-in-progress commit messages while iterating, and to drop it (so CI runs) on the final commit before opening or updating the PR for review. That avoids a full CI run per intermediate commit while still guaranteeing a green run before merge.
There was a problem hiding this comment.
This is deterministic and should be in agent hooks in the various harnesses
There was a problem hiding this comment.
True, although we are not planning to create agent hooks within this project (even though it would be ideal if people used them)
| Each PR requires reviews: 1 for dev, 2 for main. Advise the user to ask for reviews in the nf-core Slack, in `#request-review` (for dev PRs) or `#release-review-trading` (for main PRs). There is also a CI pipeline executed on each PR. All checks must pass before the PR can be merged. | ||
|
|
||
| ## Agent self-disclosure | ||
| As an AI agent, you are required to acknowledge your activity in nf-core. If you generated a majority of the code in a commit, add "This commit was generated by {your name}" at the of the commit message body. If you open a PR autonomously, add "This pull request was created by {your name}" at the end of the PR message (above the checklist). |
There was a problem hiding this comment.
I just read somewhere that some models/harnesses may not be good at identifying themselves. https://www.reddit.com/r/ollama/comments/1u6b5o9/i_dont_trust_ollama_cloud_is_it_possible_that_its/
There was a problem hiding this comment.
The name of the model is not recoverable because it is not part of the context, and the model may change between sessions. What we want to include here is the name of the agent, which should be injected into the context (at least it is for Copilot, but should also be for the others). Knowing the underlying model at commit time would be nice, but it may not be too reliable, as the link suggests.
| All comments and documentation must be written in English with British spelling. Documentation files should additionally follow the style guide at https://nf-co.re/docs/developing/documentation/style-guide. | ||
|
|
||
| ## Nextflow pitfalls | ||
| TBC |
There was a problem hiding this comment.
One suggestion is that publishing should be handled by one method, either publishDir in the modules.config, or the output block in the workflow file. I'm not sure who's migrated to the output block though.
nf-core tools sometimes fails. Ask the user how to proceed. Don't auto-generate files that should be generated by an nf-core tools command.
There's the question about DSL1 syntax ( is everything migrated now? )
Ensure all variables are scoped (def).
Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com> Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
…o add-agents-md
|
Pushed some updates, hopefully they address the feedback so far |
|
Really nice draft @itrujnara — the content here is genuinely strong, and most of it is exactly what an agent needs. Before we lock in where this lives, I wanted to float a bigger-picture question, since I think how we distribute it matters about as much as the content itself. A few things rolled together: 1. A root 2. The "referenced by the template AGENTS.md" bit is the part I'd love to talk through. In practice agents don't reliably fetch a remote central file at runtime — they read what's on disk. So a runtime reference risks not getting loaded, and it overlaps with the template ownership tools already has. We sort of pointed at this already in the "updates should flow through nf-core tools because it's deterministic" thread (@maxulysse / nf-core/tools#4328) — if tools owns the sync, it feels natural for the content to live in the template rather than in a repo that pipelines point at. 3. Different contexts probably want their own file. Roughly:
4. Where I think this repo really shines. The piece with no other obvious home is the harness glue — a Claude Code plugin (hooks that run lint/ One caveat in the other direction: if we end up with several real consumers of the same prose (template + modules + docs site), a content-source repo that tools vendors in at release (not referenced at runtime) would make sense. I'd just hold off on that until the duplication actually bites. None of this is meant to slow down the great work on the content — mostly want to make sure we're happy with the distribution model before it spreads across pipelines. Keen to hear what everyone thinks! Generated by Claude Code |
|
Hi @edmundmiller, thanks for your feedback. It is true that each AGENTS.md should live in the related repo, but here the decision to split was deliberate. The full discussion can be found in nf-core/proposals#141 and nf-core/proposals#143. |
vagkaratzas
left a comment
There was a problem hiding this comment.
- good sections coverage (might even be too much)
- way too verbose overall IMO
- unsure about git commits, should add to ask user who wants to make the commits
- rewrite with a bot, where less text is more (tokens to spend later)
Might have gone a bit out of scope. I thought we would just have a basic folders description and code guiderails, and then links to nf-core online docs. But now it seems this has everything plus more.
| @@ -0,0 +1,188 @@ | |||
| # nf-core: agents | |||
|
|
|||
| This is the main AI context file for nf-core pipelines. All AI agents and coding assistants must read and strictly follow the rules contained in this document. | |||
There was a problem hiding this comment.
When setting bot guiderails, bold/caps can help, because models process Markdown and textual emphasis. Rule-like wording examples such as this:
- Agents **MUST** run `nf-core pipelines lint` before submitting changes.
- Agents **MUST NOT** modify generated files manually.
- Agents **SHOULD** prefer existing pipeline patterns over introducing new conventions.
Do this throughout.
| This is the main AI context file for nf-core pipelines. All AI agents and coding assistants must read and strictly follow the rules contained in this document. | |
| This is the main AI context file for nf-core pipelines. All AI agents and coding assistants **MUST** follow the rules contained in this document. |
| This is the main AI context file for nf-core pipelines. All AI agents and coding assistants must read and strictly follow the rules contained in this document. | ||
|
|
||
| ## Nextflow language | ||
| Unless otherwise stated, all code in the repository is written in the Nextflow programming language. The documentation can be found at https://docs.seqera.io/nextflow/. |
There was a problem hiding this comment.
Not true for bin/ scripts, .tml files, ++ I guess. Maybe say for workflows / subworkflows or skip, because this is self-explenatory to the bots without needing to reading this line.
| All comments and documentation must be written in English with British spelling. Documentation files should additionally follow the style guide at https://nf-co.re/docs/developing/documentation/style-guide. | ||
|
|
||
| ## Nextflow pitfalls | ||
| - Nextflow supports 2 ways to publish files to the output directory: workflow outputs (modern) and `publishDir` configuration directives (legacy). If the pipeline uses legacy outputs, use them consistently for new modules unless directly prompted otherwise. If the pipeline uses workflow outputs, use them consistently and never revert to legacy outputs. |
There was a problem hiding this comment.
should you link the modules.config here, where guides this? Some pipeline might use both, so this might be more confusing to the bot than intended
| - Nextflow supports 2 ways to publish files to the output directory: workflow outputs (modern) and `publishDir` configuration directives (legacy). If the pipeline uses legacy outputs, use them consistently for new modules unless directly prompted otherwise. If the pipeline uses workflow outputs, use them consistently and never revert to legacy outputs. | ||
| - nf-core tools commands may fail. If that happens, ask the user for help. Never generate any file that is supposed to be generated by nf-core tools. | ||
| - Certain very old pipelines might be using Nextflow DSL1 syntax (with the entire workflow in a single file and channel from/to keywords). This syntax is now deprecated. Do not attempt to work on those pipelines and advise the user to refactor to DSL2. | ||
| - All variables must be defined using the `def` keyword (example: `def x = 5`). |
There was a problem hiding this comment.
This is too much detail for here. The bots should just use nextflow lint * and the language server to bypass these problems.
There was a problem hiding this comment.
It's also just not true in the slightest and in fact will break regularly. For example if you define def prefix = 'blahblahblah' in your script section and then try to use path("${prefix}.tsv") in output, then your code will fail.
| │ ├── samplesheet.csv // example valid samplesheet | ||
| │ └── schema_input.json // JSON schema describing the samplesheet format | ||
| ├── bin // scripts for local modules | ||
| | ├── script.py // all scripts must start with a shebang and carry a licence/author header |
|
|
||
| If you work on multiple features in parallel, use a separate worktree for each task to prevent clobber. | ||
|
|
||
| If you only want to fix a bug in a released version of a pipeline, you should instead create a branch called `patch` from `main`, work in it, and open a PR to nf-core main once done. |
There was a problem hiding this comment.
If I remember correctly the branch should be named patch: ++ ? Make sure else the PR will be auto rejected I think
There was a problem hiding this comment.
https://nf-co.re/docs/specifications/pipelines/requirements/git_branches says it should be called patch
There was a problem hiding this comment.
I wouldn't mention this option for this general use case at all. this feature is more for "emergency" patches
| ## Commit rules and routine | ||
| Each commit should be as atomic as possible, that is, only contain one logical change. There is no limit on the number of files in a commit. There is no mandated commit message format, but the commit title should be concise and written in imperative mood. If the commit consists only of installing or updating an nf-core module or subworkflow, limit the commit title to `Install/update nf-core module/subworkflow {name}`. | ||
|
|
||
| Before each commit, run `prek` and stage all changes it generates. Resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. After that, you are free to commit your changes. |
There was a problem hiding this comment.
| Before each commit, run `prek` and stage all changes it generates. Resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. After that, you are free to commit your changes. | |
| Before each commit, stage changes and then run `prek`. Resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. After that, you are free to commit your changes. |
| ## Push routine | ||
| You can push changes to GitHub as often as required, especially during PR review, but you should only push after implementing some meaningful changes. Only push if the code is working. | ||
|
|
||
| Before pushing, ensure nf-core linting is passing. Run `nf-core pipelines lint`, resolve all errors and all possible warnings. Repeat until there are no solvable outstanding issues. If you are preparing a release (PR to main), use `nf-core pipelines lint --release` instead. |
There was a problem hiding this comment.
dont forget the strict syntax linting nextflow lint *
There was a problem hiding this comment.
Mentioned in an earlier comment, prek runs this command automatically
| GitHub Actions will run CI for every push. If you know the code will cause issues or you intend to push more changes, add `[skip ci]` at the end of the commit title. Omit this tag if the changes are final, especially right before a PR or when you want the code to be reviewed. | ||
|
|
||
| ## PR procedure | ||
| Changes to nf-core `dev` and `main` branches must be made through GitHub pull request. A PR should generally contain a single feature. The PR must use and follow the nf-core PR template, including the checklist. The PR message should start with a brief explanation of the changes made and the motivation. |
| ## PR procedure | ||
| Changes to nf-core `dev` and `main` branches must be made through GitHub pull request. A PR should generally contain a single feature. The PR must use and follow the nf-core PR template, including the checklist. The PR message should start with a brief explanation of the changes made and the motivation. | ||
|
|
||
| Each PR requires reviews: 1 for dev, 2 for main. Advise the user to ask for reviews in the nf-core Slack, in `#request-review` (for dev PRs) or `#release-review-trading` (for main PRs). There is also a CI pipeline executed on each PR. All checks must pass before the PR can be merged. |
There was a problem hiding this comment.
I am not sure we want a guiderails markdown file, also making the bot provide advice to the user. To discuss more on this
|
Hi @vagkaratzas, thanks a lot for your feedback. I have applied the most straightforward suggestions. I will need to run through the file again tomorrow to decrease verbosity/redundancy and highlight key instructions. |
vagkaratzas
left a comment
There was a problem hiding this comment.
Left a couple of comments/suggestions but I think it's in the right direction now. The only way to evaluate this is by actually using it. And people reporting what works and whatnot so we can keep improving
ewels
left a comment
There was a problem hiding this comment.
Great work. Let's ship it and iterate 👍🏻
Co-authored-by: Evangelos Karatzas <32259775+vagkaratzas@users.noreply.github.com> Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
| │ └── nf-core // nf-core modules (see section below) | ||
| │ ├── fastqc | ||
| │ | ├── main.nf // Nextflow script, may be edited if necessary | ||
| | | └── ... // do not edit other files in nf-core modules |
There was a problem hiding this comment.
why? how are they different from editing main.nf?
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
|
|
||
| Several files have been skipped from the treemap. If a file is not in the treemap, you **SHOULD NOT** edit it unless explicitly prompted. | ||
|
|
||
| ## Key nf-core terms |
There was a problem hiding this comment.
Same as for the module repo - as a human I'd prefer these definitions frontloaded, I don't know if that applies in this context though.
|
Changed file path to |
This is the initial draft of the central nf-core AGENTS.md. This file will be referenced by the template AGENTS.md. Feel free to leave any comments and suggestions, since this file should be based on community experience.