diff --git a/CHANGELOG.md b/CHANGELOG.md index 10505a3a4..e6167ed75 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -31,6 +31,15 @@ Also, it is now much simpler to contribute to Nextclade. If you wanted to contri - **Fix**: 3' terminal insertions are now properly detected +- **Feature**: "Retry reverse complement" alignment parameter is added. When enabled, an additional attempt of seed matching is made after initial attempt fails. The second attempt is performed on reverse-complemented sequence. + + As a consequence: + - the output alignment, peptides and analysis results correspond to this modified sequence and not to the original + - sequence name gets a suffix appended to it for all output files (fasta, seqName column, node name on the tree etc.) + - in output files, there is a new field/column: `isReverseComplement`, which contains `true` if the corresponding sequence underwent reverse-complement transformation + + This functionality is opt-in and the default behavior is unchanged: skip sequence and emit a warning. + ### Genes on reverse (negative) strand Nextclade now correctly handles genes on reverse (negative) strand, which is particularly important for Monkeypox virus. @@ -90,16 +99,16 @@ Nextclade now correctly handles genes on reverse (negative) strand, which is par - The new flag `--output-selection` allows to restrict what's being output by the `--output-all` flag. - - The new flag `--output-translations` is a dedicated flag to provide a file path template which will be used to output translated gene fasta files. This flag accepts a template string with a template variable `{{gene}}`, which will be substituted with a gene name. Each gene therefore receives it's own path. Additionally, the translations are now independent from output directory and can be omitted if they are not necessary. + - If the `--output-basename` flag is not provided, the base name of output files will default to "nextclade" or "nextalign" respectively for Nextclade CLI and Nextalign CLI. They will no longer attempt to guess base file name from the input fasta. - - **Feature**: New `--excess-bandwidth`, `--terminal-bandwidth`, `--min-match-rate` arguments are added (see "Alignment algorithm rewritten with adaptive bands" section for details) + - The new flag `--output-translations` is a dedicated flag to provide a file path template which will be used to output translated gene fasta files. This flag accepts a template string with a template variable `{gene}`, which will be substituted with a gene name. Each gene therefore receives it's own path. Additionally, the translations are now independent from output directory and can be omitted if they are not necessary. - Example: + Example: If the following is provided: ```bash - --output-translations='output_dir/gene_{{gene}}.translation.fasta' + --output-translations='output_dir/gene_{gene}.translation.fasta' ``` then for SARS-CoV-2 Nextclade will write the following files: @@ -111,7 +120,13 @@ Nextclade now correctly handles genes on reverse (negative) strand, which is par output_dir/gene_S.translation.fasta ``` - Make sure you properly quote and/or escape the curly braces in the variable `{{gene}}`, so that your shell, programming language or pipeline manager does not attempt to substitute the variable. + Make sure you properly quote and/or escape the curly braces in the variable `{gene}`, so that your shell, programming language or pipeline manager does not attempt to substitute the variable. + + + + - **Feature**: New `--excess-bandwidth`, `--terminal-bandwidth`, `--min-match-rate`, `--retry-reverse-complement` arguments are added (see "Alignment algorithm rewritten with adaptive bands" section for details) + + - **Feature**: Nextclade CLI and Nextalign CLI now accept compressed input files. If a compressed fasta file is provided, it will be transparently decompressed. Supported compression formats: `gz`, `bz2`, `xz`, `zstd`. Decompressor is chosen based on file extension.