Releases · nextstrain/nextclade

24 Jan 13:07

nextstrain-bot

2.10.0

da1928e

2.10.0

Nextclade Web 2.10.0, Nextclade CLI 2.10.0 (2023-01-24)

Add motifs search

Nextclade datasets can now be configured to search for motifs in the translated sequences, given a regular expression.

At the same time, we released new versions of the following Influenza datasets, which use this feature to detect glycosylation motifs:

Influenza A H1N1pdm HA (flu_h1n1pdm_ha), with reference MW626062
Influenza A H3N2 HA (flu_h3n2_ha), with reference EPI1857216

If you run the analysis with the latest version of these datasets, you can find the results in the glycosylaiton column or field of output files or in "Glyc." column in Nextclade Web.

If you want to configure your own datasets for motifs search, see an example configuration in the aaMotifs property of virus_properties.json of these datasets: link.

Allow to chose columns written into CSV and TSV outputs

You can now select a subset of columns to be included into CSV and TSV output files of Nextclade Web (available in the "Download" dialog) and Nextclade CLI (available with --output-csv and --output-tsv). You can either chose individual columns or categories of related columns.

In Nextclade Web, in the "Download" dialog, click "Configure columns", then check or uncheck columns or categories you want to keep. Note that this configuration persists across different Nextclade runs.

In Nextclade CLI, use --output-columns-selection flag. This flag accepts a comma-separated list of column names and/or column category names. Individual columns and categories can be mixed together. You can find a list of column names in the full output file. The following categories are currently available: all, general, ref-muts, priv-muts, errs-warns, qc, primers, dynamic. Another way to receive both lists is to add a non-existent or misspelled name to the list. The error message will then display all possible columns and categories.

Note that because of this feature the order of columns might be different compared to previous versions of Nextclade.

Add URL parameter for running analysis of example sequences

You can now launch the analysis of example sequences (as provided by the dataset) in Nextclade Web, by using the special keyword example in the input-fasta URL parameter. For example, navigating to this URL will run the analysis of example SARS-CoV-2 sequences (same as choosing "SARS-CoV-2" and then clicking "Load example" in the UI):

https://clades.nextstrain.org/?dataset-name=sars-cov-2&input-fasta=example

This could useful for example for testing new datasets:

https://clades.nextstrain.org/?dataset-url=http://example.com/my-dataset-dir&input-fasta=example

Add `index` column to CSV and TSV outputs

The index field is already present in other output formats. In this version CSV and TSV output files gain index column as well, which contains the index (integer signifying location) of a corresponding record in the input fasta file or files. Note that this is not the same as row index, because CSV/TSV rows can be emitted in an unspecified order in Nextclade CLI (but this can be changed with --in-order flag; which is set by default in Nextclade Web).

Note that sequence names (seqName column) are not guaranteed to be unique (and in practice are not unique very often). So indices is the only way to reliably link together inputs and outputs.

Assets 16

09 Dec 18:45

github-actions

2.9.1

3baa16a

2.9.1

Nextclade Web 2.9.1, Nextclade CLI 2.9.1 (2022-12-09)

Set default weights in "private mutations" QC check to 1

This fixes the bug when the QC score is 0 (good) when the following QC fields are missing from qc.json:

.privateMutations.weightLabeledSubstitutions
.privateMutations.weightReversionSubstitutions
.privateMutations.weightUnlabeledSubstitutions

In this case Nextclade assumed value of 0, which lead to QC score of 0 always. Not all datasets were adjusted for the new qc.json format in time and some had these fields missing - notably the flu datasets. So these datasets were erroneously showing perfect QC score for the "private mutations" rule.

In this version we set these weights to 1.0 if they are missing, which fixes the incorrect QC scores. Some of the sequences will now correctly show worse QC scores.

Fix dataset selector in Nextclade Web when there are datasets with the same name, but different reference sequences

The dataset selector on the main page on nextclade Web did not allow selecting datasets with the same name, but different reference sequences. This did not affect users so far, but we are about to release new Influenza datasets, which were affected. In this version we resolve the problem by keeping track of datasets not just by name, but by a combination of all attribute values (the .attributes[] entries in the datasets index JSON file).

Ensure non-default references in "dataset list" command of Nextclade CLI are shown

This introduces special value all for --reference argument of nextclade dataset list command. And it is now set as default. When it's in force, datasets with all reference sequences are included into the displayed list. This resolves the problem where non-default references are not show in the list.

Internal changes

We are now submitting PRs to bioconda automatically, which should reduce the delay of updates there

Commit history

(click to expand)

---

Instructions

📥 Nextclade CLI & Nextalign CLI can be downloaded from the links in the "Assets" section just below. There click "Show all" to show more options. Note the difference between "nextalign" and "nextclade" files.

🌐 Nextclade Web is available at https://clades.nextstrain.org

🐋 Docker images are available at DockerHub

📚 To understand how it all works, make sure to read the Documentation

Assets 16

06 Dec 16:26

nextstrain-bot

2.9.0

2cd03c2

2.9.0

Nextclade Web 2.9.0, Nextclade CLI 2.9.0 (2022-12-06)

Increase requirements for supported Linux distributions for GNU flavor of Nextclade CLI

Due to malfunction of package repositories of Debian 7, we had to switch automated builds of the "gnu" flavor of Nextclade CLI from Debian 7 to CentOS 7. This increases minimum required version of glibc to 2.17. The list or Linux distributions we tested the new version of Nextclade on is here. For users of older Linux distributions (with glibc < 2.17) we suggest to use "musl" flavor of Nextclade CLI, which does not depend on glibc, but might be substantially slower. Users of Nextclade CLI on macOS and Windows and users of Nextclade Web are not affected.

Add gene length validation in GFF3 parser

Nextclade will now check if genes have length divisible by 3 in gene maps and will fail with an error if it's not the case.

Fix translated (internationalized) strings in Nextclade Web

We fixed missing spaces between words in some of the languages and fixes some of the translations.

Internal changes

build Linux binaries on CentOS 7
migrate CI to GitHub Actions
upgrade Rust to 1.65.0

Commit history

(click to expand)

---

Instructions

🌐 Nextclade Web is available at https://clades.nextstrain.org

🐋 Docker images are available at DockerHub

📚 To understand how it all works, make sure to read the Documentation

Assets 16

20 Oct 07:33

nextstrain-bot

2.8.0

ba80d9f

2.8.0

Nextclade Web 2.8.0, Nextclade CLI 2.8.0 (2022-10-20)

Community datasets in Nextclade Web

This release adds support for fetching custom datasets from a remote location. This can be used for testing datasets introducing support for new pathogens, as well as for sharing these datasets with the community.

For that, we added the dataset-url URL query parameter, where you can specify either a direct URL to the directory of your custom dataset:

https://clades.nextstrain.org?dataset-url=http://example.com/path/to/dataset-dir

or a URL to a GitHub repository:

https://clades.nextstrain.org?dataset-url=https://github.com/my-name/my-repo/tree/my-branch/path/to/dataset-dir

or a special shortcut to a GitHub repository:

https://clades.nextstrain.org?dataset-url=github:my-name/my-repo@my-branch@/path/to/dataset-dir

If a branch name is not specified, the default branch name is queried from GitHub. If a path is omitted, then the files are fetched from the root of the repository.

When dataset-url parameter is specified, instead of loading a list of default datasets, a single custom dataset is loaded from the provided address. Note that this should be publicly accessible and have CORS enabled. GitHub public repositories already comply with these requirements, so if you are using a GitHub URL or a shortcut, then no additional action is needed.

For more information, refer to:

Compression of all input and output files in Nextclade CLI and Nextalign CLI

Previously, only FASTA files could be compressed and decompressed on the fly. Now Nextclade CLI and Nextalign CLI can read all input and write all output files in compressed formats. Simply add one of the supported file extensions: "gz", "bz2", "xz", "zstd", and the files will be compressed or decompressed transparently.

Decrease default number of threads in Nextclade Web

Some users have observed long startup times of the analysis in Nextclade Web. In this release we decreased the default number of processing threads from 8 to 3, such that startup time is now a little faster.

If you want to speedup the analysis of large batches of sequences, at the expense of longer startup time, you can tune the number of threads in the "Settings" dialog.

Improve readability of text fragments in Nextclade Web

We've made text paragraphs on main page and some other places a little prettier and hopefully more readable.

Fix crash when reading large, highly-nested tree files

We improved handling of Auspice JSON format, such that it no longer crashes when large trees and trees with large number of deep branches are provided.

Commit history

(click to expand)

[7438805] chore(deps): bump auspice in /packages_rs/nextclade-web

Bumps auspice from 2.38.0 to 2.39.0.

updated-dependencies:

dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] [email protected]

[891af32] Update clades.svg
[0e41219] Merge pull request #1013 from nextstrain/update-clades-svg

Update clades.svg

[77bcc03] Merge pull request #1012 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.39.0
[e6aab13] chore: release web v2.7.1
[3de3205] chore: disable core dumps when running dev docker container [skip ci]
[6cb95a4] refactor: minify clades.svg
[02ea73a] feat(web): prettify markdown rendering, changelog window text

Applies small visual fixes and tweaks to make markdown text a little more readable.

Highlights headings in the changelog and fixes a few visual bugs in code snippets.

Removes unnecessary and unused styles.

[63f6d21] Merge pull request #1016 from nextstrain/refactor/web-minify-clades-svg
[82b93f2] Merge pull request #1017 from nextstrain/feat/web-prettier-markdown
[2539c25] fix: ensure parsing of large, highly-nested tree JSONs

Resolves #1018

When tree JSON is sufficiently nested, serde_json fails to parse it into the AuspiceTree struct with:

recursion limit exceeded at line <line> column <column>

This problem has been reported before in serde_json in serde-rs/json#334, and the solution allowing to remove the recursion limit was implemented in the form of Deserializer::disable_recursion_limit traded off to the increased risk of overflowing the call stack.

So here I implemented the suggested solution, by calling the disable_recursion_limit() on deserializer. I am also mitigating the risk of stack overflow by using growing stack adapter from serde_stacker.

This allows to successfully parse the large tree from #1018, although probably also makes the tree parsing a bit slower. But it is not critical for overall runtime, as it is a one-off operation.

[7c437b0] chore: remove unrelated changes [skip ci]
[6ed348b] refactor: deduplicate json parsing code
[33fd032] feat(web): dynamically set dataset server root using a URL param

Allows to dynamically set dataset server root URL in Nextclade Web on runtime.

It is sufficient to add URL parameter dataset-server with the value containing URL to the root of the dataset file server (containing index_v2.json), and Nextclade Web will fetch dataset index and all dataset files from this server instead of the default:

?dataset-server=http://example.com

The localhost URLs should also work:

?dataset-server=http://localhost:27722

This is equivalent to --server flag in Nextclade CLI.

This should facilitate testing of custom datasets in Nextclade Web.

[02b19a2] chore: enable CORS for local dataset server
[064393e] docs: document dataset-server URL param
[ed4b775] feat(web): reduce default number of threads to speedup initialization

Despite "thread" (actually webworker) initialization is supposed to be concurrent, there seem to be a clear trend of increased initialization time as number of threads increased. This is likely due to data transfer overhead between workers.

Here I reduce number of threads that can be initialized with default settings.

But these setting can always be altered in the "Settings" dialog.

[c4e7736] Merge pull request #1020 from nextstrain/feat/web-dataset-server-url-param
[7890180] Merge branch 'master' into fix/tree-json-parsing-recursion-limit
[ecdff00] Merge pull request #1021 from nextstrain/feat/web-decrease-num-threads
[d677b95] docs: add info about dataset-server URL param to dev guide [skip ci]
[e77141e] Merge remote-tracking branch 'origin/master' into fix/tree-json-parsing-recursion-limit
[[62008ff](https://github.com/nextstrain/nextclade/commit/62008ff877b6378e5e8a5ca06200d2044e9...

Assets 16

05 Oct 14:48

nextstrain-bot

2.7.0

66c6dd2

2.7.0

Nextclade CLI 2.7.0, Nextclade Web 2.7.0 (2022-10-05)

Hide custom clade columns

We added ability to mark certain custom clade columns as hidden. In this case they are not shown in Nextclade Web. This prepares the web application for the upcoming reorganization of clade columns. It should not affect current users.

Remove unused fields from output files, add custom phenotype key list

We removed extra repetitive fields related to custom phenotype columns (e.g. "Immune escape" and "ACE-2 binding") from JSON and NDJSON output files. We also added keys for custom phenotype columns to the header section of output JSON, for symmetry with custom clade columns. These changes should not affect most users.

Commit history

(click to expand)

[21c5e5e] chore: apply automatic clippy lint fixes
[d71864c] chore: lint
[de4079a] chore: format
[296e9f8] Merge pull request #1003 from nextstrain/chore/lint
[362f280] chore(ci): only release cli from release-cli branch
[8337dc0] fix: link to release dataset changlog

Currently release users see master dataset changelog. Here I change the link such that they see dataset changelog on release branch, i.e. only released dataset changes.

[95bda16] Merge pull request #1004 from nextstrain/fix/web-dataset-changelog-link
[ccb35e9] feat(web): format tooltip text

Format text of custom phenotype columns. This text comes from a JSON string, so formatting capabilities are very limited. Here I split the incoming string on \n and create paragraphs (<p>) from each fragment.

[6f48b17] Merge pull request #1005 from nextstrain/feat/web-format-tooltip-text
[f7d3ae8] chore: release web v2.6.1
[7f6cd6b] feat: allow hiding custom clade columns
[8e7eb64] fix: show custom clade columns by default

Let's flip the boolean, so that it's natural default is false

[54117cf] fix: remove unused data from outputs

Removes results[].phenotypeValues.nameFriendly and results[].phenotypeValues.description from output JSON and NDJSON. These fields were not intended to be used, are repeated and increase file size needlessly.

[ee0c659] fix: add descriptions for phenotype attributes to outputs

Adds descriptions of phenotype attributes to .phenotypeAttrKeys of JSON output. This is for symmetry with the similarly implemented .cladeNodeAttrKeys attributes.

[f145b45] Merge pull request #1009 from nextstrain/fix/remove-unused-output-data
[c8fe93e] Merge pull request #1010 from nextstrain/fix/add-phenotype-descriptions-to-outputs
[d30d608] fix: reset node attr state correctly across runs
[ade06e2] Merge pull request #1006 from nextstrain/feat/allow-hiding-custom-clade-columns

feat: allow hiding custom clade columns

[1c3377b] chore: speedup wasm dev build

Here for wasm dev build I replace the wasm-pack which is nice, but behind its defaults it hides lots of useful stuff, and I call cargo build, wasm-bindgen and wasm-opt explicitly. Notably wasm-pack does not allow custom cargo profiles (rustwasm/wasm-pack#1111).

I also add the new "opt-dev" cargo profile, which is something in the middle between release and dev - a balance between rebuild speed and runtime performance. This should save lots of time during day-to-day work.

The production version still uses wasm-pack and a more tried solution.

The dev version now requires cargo wasm toolchain, wasm-bindgen and wasm-opt (from binaryen project) installed manually. Previously wasm-pack would handle that. The dev docker image was updated accordingly.

[ae49768] fix: adjust import

For the changes in 1c3377b
(there no longer is package.json, so import of the directory does not work)

[4f5db09] fix(web): crash when exporting files in web

Followup of #1006

The PR #1006 introduced a bug: it filters away hidden columns from recoil state, thus making them unavailable for export functions. The mismatch in expected columns caused crash on Rust side.

Here I modify the filtering logic, moving it to the actual rendering site, instead of filtering globally. This way, hidden columns are not rendered, but are still be available for export.

[f0975a2] chore: remove unused script [skip ci]
[19b1932] chore: document dev scripts [skip ci]
[70122ce] Merge pull request #1011 from nextstrain/fix/web-crash-on-export
[66c6dd2] chore: release web and cli 2.7.0

Assets 16

27 Sep 07:31

nextstrain-bot

2.6.0

7fc59a1

2.6.0

Nextclade Web 2.6.0, Nextclade CLI 2.6.0 (2022-09-27)

New metrics: Immune escape and ACE-2 binding

We added software support for the new custom metrics in Nextclade CLI and nextclade Web. The dataset "sars-cov-2-21L" with the data required for these metrics to appear will be released in the coming days. Stay tuned.

The `dataset-name` URL parameter is now properly applied in Nextclade Web, even when `input-fasta` is not provided

Previously dataset-name URL parameter was ignored, unless input-fasta is also set. Now the dataset-name will make Nextclade Web to preselect the requested dataset, regardless of whether the input-fasta URL parameter is also provided. This allows to create URLs which preconfigure Nextclade Web with a certain dataset and dataset customizations, with intent to provide fasta and to run manually.

Better error handling

We improved error handling, such that some of the errors in Nextclade Web now have better error messages. Some of the errors that previously caused hard crash in in Nextclade CLI are now handled more gracefully and with better error messages.

Internal changes

We upgraded Rust to 1.63.0 and Nextclade CLI and Nextalign CLI are now using std::thread::scope for better multithreading support.

Commit history

(click to expand)

[b2ba122] chore: remove deprecation notices from v1
[3a374c2] refactor: move types.ts file

It is lonely in the old deprecated algorithms/ directory, let's put it at the root.
Remaining changes are just import path change (automated).

[0583e91] refactor: format
[4a17d36] Merge pull request #981 from nextstrain/refactor/cleanup
[ab12755] fix(web): correct text instructions for fasta text fields
[129f549] Merge pull request #983 from nextstrain/fix/text-field-text
[7ea8414] fix: consider different file encodings

This attempts to mitigate crash in web when the fasta file is not UTF-8 or ASCII encoded.

Here I am trying to deduce a list of encodings the file resembles and then attempting to decode the contents of the file from these encodings to a JS string, instead of simply using the FileReader's .toString() function.

[992fc57] refactor: lint
[6f4b5b4] feat: try a more accurate chardet library
[b0f7bf2] feat: rephrase error message
[18d8716] Merge pull request #987 from nextstrain/fix/web-file-encoding
[f6bdaae] refactor: lint, fix typings
[86e2ac1] Merge pull request #988 from nextstrain/refactor/lint
[e8cce45] feat: calculate escape
[b5c6b29] feat: add escape to csv and tsv outputs
[add2a85] feat(web): add escape column
[551e4e2] feat: hide 'Escape' column when there aren't any values
[e4c02b0] refactor: lint
[8694a52] feat: avoid sending filenames to plausible
[f5cbfa2] feat: improve error page and popup
[b2f3b41] refactor: lint
[9650e27] Merge pull request #992 from nextstrain/feat/web-improve-error-handling
[ccbcacf] feat: extract translation strings
[ff265c1] Merge branch 'feat/extract-translation-strings'
[628666a] feat: setup automatic translation with AWS Translate
[d4e9bde] Merge pull request #993 from nextstrain/feat/autotranslate
[a674e35] feat: apply automatic translations
[b7a3ab1] feat(web): add more languages
[14b98fc] feat(web): translate more strings
[31bc2ef] refactor: correct compiler error
[b495e7b] feat(web): persist selected locale
[e9f63e1] fix(web): sort locales
[f90856b] Merge pull request #994 from nextstrain/feat/translate
[09ec70c] chore(deps): bump auspice in /packages_rs/nextclade-web

Bumps auspice from 2.37.3 to 2.38.0.

updated-dependencies:

dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] [email protected]

[d3b35c3] Merge pull request #996 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.38.0
[cc79e84] chore: release web v2.5.1
[bf21f97] Merge remote-tracking branch 'origin/master' into feat/escape
[44b1f9e] fix: correct escape algorithm
[125ac16] fix(web): ensure sorting results table by escape works
[ce1693a] feat: add text to escape help tooltip
[3a502a9] Add clarification on when escape scores are unreliable
[b00e6c3] feat(web): make escape column tooltip wider
[9450041] feat: allow optional name different from gene name for escape values
[ac1b14a] feat(web): add tooltips for escape column
[3cd1596] feat: skip escape calculation for outgroup samples
[e499dc7] fix(web): sorting by escape
[2320822] feat: list ignored clades in the virus json
[e06e142] feat: allow for escape coefs per posiition only and per posiition per aa
[dfbf64b] refactor: rename identifiers to clarify intent
[61e4fd7] refactor: lint
[6d28d49] feat: render phenotype values in separate columns

-...

Assets 16

31 Aug 05:48

nextstrain-bot

2.5.0

6a1b391

2.5.0

Nextclade CLI 2.5.0, Nextclade Web 2.5.0 (2022-08-31)

Feature (CLI, Web): Coverage analysis

Nextclade now emits "coverage" metric which shows the portion of nucleotides in the alignment range being non-N and non-ambiguous, compared to the length of the reference sequence:

coverage = ((alignment_end - alignment_start) - total_missing - total_non_acgtns) / ref_len;

The metric is displayed as a percentage in the "Cov." column of Nextclade Web, and emitted into JSON, NDJSON, CSV and TSV outputs of Nextclade CLI and Web in the "coverage" field or column.

Feature (Web): Display machine-readable dataset names

Dataset selector on the main page of Nextclade Web now additionally shows machine-readable dataset name. This can help advanced Web users to put correct dataset name into the URL parameters, and CLI users to find the correct dataset name for downloads.

Feat (Web): Compact results table

We made some of the columns in results table of Nextclade Web narrower to make user experience a little better on laptops. When possible, for optimal experience, we still recommend to use 1080p displays or larger.

Fix (Web): Crashes when using filtering panel

Users reported intermittent crashes of Nextclade Web when entering values in the filtering panel on results page of Nextclade Web. This have been fixed now. If you stil have problems, please submit an issue in our GitHub repository.

Commit history

(click to expand)

[b9a9a4b] feat(web): show dataset machine-readable name in dataset selector
[2de9310] Merge remote-tracking branch 'origin/master' into feat/web-dataset-machine-readable-name
[4bad667] fix: correctly reset duplicate sequence name map on new runs

Currently duplicate sequence name data persists across unrelated runs of Nextclade Web, this causes incorrect reporting of duplicates.

This PR ensures that duplicate sequence name data does not persist across runs.

[4f23954] Merge pull request #950 from nextstrain/fix/web-dup-names-reset
[d4430e9] docs: add changelog for web 2.4.0
[4435faf] chore: release web v2.4.1
[1cd8df0] feat: calculate coverage qc metric
[4c2e5ed] feat: add coverage qc metric to csv and tsv outputs
[03cd444] feat(web): add coverage metric to web UI
[e8b59f1] feat(web): correct and restyle download links on main page

The currently deployed version links to non-existent v1 files.

I thought that removing extra explanations (found in the docs) and making buttons more prominent will make it easier for users to discover these links.

[fab66aa] feat: add text suggested by @emmahodcroft
[c16836e] doc: update about text in web app
[6aef071] feat: add text suggested by @rneher
[418de94] Merge pull request #960 from nextstrain/docs/update-about
[81b06c6] Merge pull request #958 from nextstrain/feat/download-links
[7309d2c] Make coverage qc more sensitive
[a11499c] fix: wrong comment character
[caf3c9a] feat: remove indels from coverage calculation

I'm not sure how handle Ns and ambiguous characters that appear in indels
The formula as is assumes missing and non_acgtns don't count bases
in indels

[92b0c26] feat: remove coverage qc metric, move it into analysis results
[2412f63] feat: remove qc metric from web app, add dedicated column
[ff9767b] feat: add coverage column
[44e90db] feat: add ref length and total covered nucs into file outputs
[b503c1b] docs: fix links to nextalign downloads
[6b71d9e] docs: fix md syntax of a header
[18aa4e3] Merge pull request #964 from nextstrain/docs/fixes
[a834ce1] chore: add 'sphinx linkcheck' makefile target
[139c4a6] fix: doc links
[fd63ac8] Merge pull request #965 from nextstrain/fix_doc_links
[984efc8] feat: add event analytics
[98c9076] Merge pull request #967 from nextstrain/feat/events
[c51184e] chore: release web v2.4.2
[739e5ed] Move coverage column after Ns
[e11a073] Squeezing columns a bit
[1735cfd] chore: precompress web root

AWS Cloudfront does not compress all files and we'd like to compress everything, including files larger than 10M and wasm for example. Here I compress files in advance with gzip and brotli and add a Lambda@Edge function which rewrites origin paths, depending on accept-encoding header.

[f6d167f] chore: fix path
[a61c72f] chore(ci): add missing deps
[5992d33] chore(ci): ensure previously uploaded files are not erased
[c8356ab] chore: update security headers
[5c8c8cf] chore: fix typescript error
[16c741f] chore: install missing dependency
[5530c69] chore: update awscli
[ae0e6e5] chore; set correct content type and encoding when deploying
[c3f5591] fix: make filtering code safer

Related to: #961

This removes casts in amino acid and nucleotide filtering code which have a potential to cause crashes in certain cases. I made undefined-comparison more lax, such that now it also checks for null and added default values to anything that can be nil.

Even though I cannot reproduce the issue, this hopefully should fix it.

[d173af2] fix: swap array and filters

Lodash intersectionWith() expects filtered array to be the first argument

[349a9e7] chore(infra): fix .gz and .br rewrites
[b7e1e28] chore(infra): redirect from results and tree pages to main page [skip ci]

They are not real pages, ...

Contributors

rneher and emmahodcroft

Assets 16

02 Aug 03:27

nextstrain-bot

2.4.0

1f9b80d

2.4.0

Nextclade CLI 2.4.0, Nextclade Web 2.4.0 (2022-08-02)

Fix (Web): use indices to identify sequences uniquely in Nextclade Web

Previously, Nextclade used sequence names to identify sequences. However, sequence names proven to be unreliable - they are often duplicated. This caused various problems where results with the same names could have been overwritten.

Since this version, Nextclade Web is using sequence indices (order of sequences in the input file or files), to tell the sequences apart, uniquely. This should ensure correct handling of duplicate names. This change only affects results table in the Web application. CLI is not affected.

Feature (Web): warn about duplicate sequence names

Nextclade Web now reports duplicate sequence names. Duplicate sequence names often confuse bioinformatics tools, databases and bioinformaticians themselves, so we are trying to encourage the community to be more thoughtful about naming of their samples.

When duplicate names are detected during analysis in Nextclade Web, the "Sequence name" column of the results table now displays a yellow "duplicates" warning icon, and its tooltip contains a list of indices of sequences (serial numbers of the sequences in the input fasta file or files) having the same name.

Note that Nextclade compares only names, not sequence data themselves.

Feature (CLI): add "download dataset and run" shortcut"

In this version we added --dataset-name (-d) argument to run command, which allows to download a dataset with default parameters and run with it immediately, all in one command.

For example this command.

nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta

or, the same, but shorter

nextclade run -O out -d sars-cov-2 sequences.fasta

will download the latest default SARS-CoV-2 dataset into memory and will run analysis with these dataset files. This is a convenience shortcut for the usual combination of nextclade dataset get + nextclade run. The dataset is not persisted on disk and downloaded on every run.

Feature (Web): Upgrade Auspice from version 2.37.2 to 2.37.3

This release includes a routine upgrade of Auspice tree view. You can read the changelog in the Auspice GitHub repository

Commit history

(click to expand)

[5da4fe5] chore(deps): bump auspice in /packages_rs/nextclade-web

Bumps auspice from 2.37.2 to 2.37.3.

updated-dependencies:

dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] [email protected]

[5a9c699] feat: assert mutation is inside the gene
[22e4e2d] feat: add "download dataset and run" shortcut"

This adds --dataset-name (-d) to run command, which allows to download a dataset with default parameters and run with it immediately, in one command.

For example this command.

nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta

will download the latest default sars-cov-2 into memory and will run analysis with these dataset files.

This is a convenience shortcut for the usual combination of dataset get + run. The dataset is not persisted on disk and downloaded on every run.

[e155b94] fix: use indices to identify sequences uniquely

Nextclade is using sequence names to uniquely identify sequences. However, sequence names come from user inputs and cannot be trusted to be unique. Neither it seems there is a consensus on uniqueness of sequence names in the bioinformatics community as a whole.

This causes various problems where sequence names are used as identifiers, and when for example there are multiple sequences with the same name. In particular, when storing analysis results, they are effectively stores in an associative container, where sequence name acts as a key. This leads to newer results overwriting older results as they arrive during analysis. Additionally, some of the HTML id properties used sequence names to add uniqueness. This was leading to incorrect HTML being produces, with multiple elements having the same id property.

In this PR I:

change internal storage to use sequence indices in the input file(s) as keys
add sequence index into HTML ids
display "Sequence index" in places where only sequence name was displayed previously

This should ensure correct handling of duplicated names.

This affects only web application. In the algorithmic and CLI parts, sequence names are not used - results are stored in the form of an array, and no HTML is involved.

[c43367e] Merge remote-tracking branch 'origin/master' into fix/web-unique-seq-ids
[e744ca6] chore: release web v2.3.1
[e2f37d7] chore: fix CHANGELOG link to PR instead of issue
[2a88f30] Merge pull request #938 from nextstrain/fix/crash-gene-overflow
[c92683a] Merge pull request #939 from nextstrain/feat/dataset-get-and-run
[8a69ac7] feat: remove sequence index from tooltips
[ec2104a] Merge pull request #946 from nextstrain/fix/web-unique-seq-ids
[92a7234] feat(web): report duplicate sequence names

This adds little yellow icons in the "Sequence name" column when a sequence has the same name as other sequences in the same run. Indices of these sequences are additionally listed in the tooltip.

[a02b60c] docs: add changelog for 2.4.0
[a2c37ad] Merge pull request #937 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.37.3

chore(deps): bump auspice from 2.37.2 to 2.37.3 in /packages_rs/nextclade-web

[5cb0f15] Merge pull request #948 from nextstrain/feat/web-report-dup-seq-names

feat(web): report duplicate sequence names

[d11f14f] docs: extend changelog for 2.4.0
[1f9b80d] chore: release cli 2.4.0 and web v2.4.0

Assets 16

26 Jul 23:23

nextstrain-bot

2.3.1

e737f19

2.3.1

Nextclade CLI 2.3.1, Nextclade Web 2.3.1

Fix #947: In datasets where genes started right at the beginning of the reference sequence, Nextclade version 2.0.0 until 2.3.0 will crash due to underflow. This is now fixed. The only Nextclade provided dataset that was affected by this bug is Influenza Yamagate HA. That dataset had a further bug in the tree so there is now a corresponding dataset bug fix release available. (report: @mcroxen)

Commit history

(click to expand)

[36e2d8f] chore: release web v2.3.0
[329b5e3] Check for auspice updates daily

Uses dependabot to check for auspice updates.

[f07a807] Delete Update-Auspice.yml

This is not needed with the new .github/dependabot.yml.

[1395627] Merge pull request #936 from victorlin/master
[f4c1aa3] docs: adjust cli docs to nextclade v2
[781e998] Merge pull request #941 from nextstrain/docs/v2
[572915b] docs: Install nextstrain.sphinx.theme extension

The new extension has sphinx_copybutton pre-installed and configured. This change enables it here.
This also makes it easier to configure other extensions across multiple docs projects.

[23010b7] Bump nextstrain-sphinx-theme to >=2022.5

This is when the extension started being actively configured.

[4fd46af] Merge pull request #943 from nextstrain/victorlin/docs-install-nextstrain-extension

docs: Install nextstrain.sphinx.theme extension

[11f6fcb] chore: add tmp data to gitignore
[2d8ab9d] refactor: lint
[2f9d677] Merge pull request #945 from nextstrain/lint
[316862e] fix: underflow when mutation in gene close to start

When the first gene is very close to the start of the sequence

[921a11b] Merge pull request #947 from nextstrain/fix/underflow-gene-context

fix: underflow when mutation in gene close to start

[5cc22de] chore: CHANGELOG for 2.3.1
[e737f19] chore: release cli 2.3.1

Contributors

mcroxen

Assets 16

12 Jul 12:36

nextstrain-bot

2.3.0

4b70821

2.3.0

Nextclade CLI 2.3.0, Nextclade Web 2.3.0

This release brings back entries for failed sequences into output files.

It was reported by @tseemann (#921) that in Nextclade v2 CSV and TSV rows are not written for failed sequences. While in v1 they were. This was unintended.

In this release:

CSV, TSV, NDJSON rows for failed entries are now also written (only seqName and errors columns are populated). Note, it's important to check for errors column and disregard other columns if there are errors. For example, in case of an error, the substitutions column will be empty, but it does not mean that the failed sequence has no substitutions.
JSON output now has a separate errors field at the root of the object, with all failed entries
NDJSON rows are also written for failed entries. They only contain index, seqName and errors fields.
new columns are written into CSV and TSV outputs: warnings and failedGenes, which include any warnings emitted for a sequence as well as a list of genes that failed translation. Now all columns of the "errors.csv" file are also in the CSV and TSV results files

We improved the warning that users of unsupported browsers (mostly Safari) receive when they browse to Nextclade web.

Further changes only relevant to those building Nextclade themselves:

@xzhub contributed a PR (#930) to improve customization of the dataset server URL: when DATA_FULL_DOMAIN is started with /, the HTTP Origin will automatically be added in front of it to make an absolute url

Commit history

(click to expand)

[0e7971e] test(cli): scaffold functional cli tests with cram
[606fad8] Merge pull request #920 from nextstrain/test/cli-func
[72695f6] Update CHANGELOG.md
[bdfd9cf] chore(ci): fix gnu linux build

Apparently Debian 7 has no pip3 package. So let's install it using get-pip script

[6bf4e03] docs(dev): add a note about CORS in dev guide
[52d0f50] fix(cli): write failed sequence results (errors) to csv and tsv outputs

It was reported by @tseemann that In Nextclade v2 csv and tsv outputs are not written for failed sequences. While in v1 they were. This was unintended.

Here I bring back writing CSV and TSV rows for failed sequences (only "errors" column is populated)

[f9280c4] feat(cli): add 'warnings' and 'failedGenes' columns to results csv/tsv

This adds new columns to main nextclade results CSV and TSV outputs: warnings and failedGenes, which include any warnings emitted for a sequence as well as a list of genes that failed translation.

Now all columns of the "errors.csv" file are also in the CSV and TSV results files.

[2fdf81a] docs: fix typos

Noted in an email to our support address (ticket 510).

[ca1c74d] Merge pull request #924 from nextstrain/docs/fix-typos

docs: fix typos

[773b1e6] chore: remove unnecessary variable
[7951530] Merge remote-tracking branch 'origin/master' into fix/cli-csv-output-errors
[b0d41e6] fix(cli): write failed sequences to json and ndjson outputs
[c1f9a72] Revert "chore: upgrade Cargo.toml's"

This reverts commit 2290adc.

[55f3342] Revert "chore: update dependencies, rust to 1.61"

This reverts commit 46a8a53.

[381c3ab] Merge pull request #925 from nextstrain/chore/revert-dep-upgrade
[5a5a060] Merge remote-tracking branch 'origin/master' into fix/cli-csv-output-errors
[b33d343] fix(web): add failed sequences to downloaded outputs in the web app
[01f5076] fix(web): typo in download dialog
[7062ced] fix(web): enable download button even if all sequences faield

Currently the button stays disabled when a ran ended in no successful results. However, this prevents downloading errors.csv and other files containing errors, which might still be useful.

[e55f31b] Merge pull request #926 from nextstrain/fix/web-typo
[da679ac] feat(web): improve unsupported browser warning

This makes the warning to appear in a modal, so that it's harder to miss. Also adds links to official websites of Chrome and Firefox.

[ab00f35] fix: add failed sequences to insertions.csv of nextalign and nextclade
[f4d4480] docs: document outputs on failures
[fe9bb44] fix: make error entries in JSON and NDJSON outputs similar to normal
[cd9ed49] Merge pull request #922 from nextstrain/fix/cli-csv-output-errors
[88238a8] docs: document limitations of JSON outputs

Adds a few warnings explaining how JSON outputs can cause increased memory consumption.

⚠️ For CLI users: Note that due to technical limitations of the JSON format, it cannot be streamed entry-by entry, i.e. before writing the output to the file, all entries need to be accumulated in memory. If the JSON results output or tree output is requested (through --output-json, --output-tree or --output-all arguments), for large input data, it can cause very high memory consumption, disk swapping, decreased performance and crashes. Consider removing these outputs for large input data, running on a machine with more RAM, or processing data in smaller chunks.

[5c73286] docs: document output compression
[cb569e5] docs: document how failed sequences are reflected in output files
[8986a95] Merge pull request #929 from nextstrain/docs/outputs
[4f56bd9] Let user use relative url for nextclade datasets server

when the DATA_FULL_DOMAIN is started with '/', the HTTP Origin will
be added automatically in front of it to make an absolute url

[4ab84c8] Replace var with let
[ee12aab] Merge pull request #930 from xzhub/master
[0200b6e] Merge branch 'fix/web-enable-dl-btn'
[34fc40a] Merge branch 'feat/web-improved-unsupported-browser-warning'
[bbaff7b] feat(cli): add a note about CSV and TSV delimiters
[356c100] Merge pull request #933 from nextstrain/feat/cli-help-csv-delimiters

feat(cli): add a note about CSV and TSV delimiters

[7c38af8] docs: update changelog [skip ci]
[4b70821] chore: release cli 2.3.0

Contributors

tseemann and xzhub

Assets 16

Releases: nextstrain/nextclade

2.10.0

Nextclade Web 2.10.0, Nextclade CLI 2.10.0 (2023-01-24)

Add motifs search

Allow to chose columns written into CSV and TSV outputs

Add URL parameter for running analysis of example sequences

Add index column to CSV and TSV outputs

2.9.1

Nextclade Web 2.9.1, Nextclade CLI 2.9.1 (2022-12-09)

Set default weights in "private mutations" QC check to 1

Fix dataset selector in Nextclade Web when there are datasets with the same name, but different reference sequences

Ensure non-default references in "dataset list" command of Nextclade CLI are shown

Internal changes

Commit history

Instructions

2.9.0

Nextclade Web 2.9.0, Nextclade CLI 2.9.0 (2022-12-06)

Increase requirements for supported Linux distributions for GNU flavor of Nextclade CLI

Add gene length validation in GFF3 parser

Fix translated (internationalized) strings in Nextclade Web

Internal changes

Commit history

Instructions

2.8.0

Nextclade Web 2.8.0, Nextclade CLI 2.8.0 (2022-10-20)

Community datasets in Nextclade Web

Compression of all input and output files in Nextclade CLI and Nextalign CLI

Decrease default number of threads in Nextclade Web

Improve readability of text fragments in Nextclade Web

Fix crash when reading large, highly-nested tree files

Commit history

2.7.0

Nextclade CLI 2.7.0, Nextclade Web 2.7.0 (2022-10-05)

Hide custom clade columns

Remove unused fields from output files, add custom phenotype key list

Commit history

2.6.0

Nextclade Web 2.6.0, Nextclade CLI 2.6.0 (2022-09-27)

New metrics: Immune escape and ACE-2 binding

The dataset-name URL parameter is now properly applied in Nextclade Web, even when input-fasta is not provided

Better error handling

Internal changes

Commit history

2.5.0

Nextclade CLI 2.5.0, Nextclade Web 2.5.0 (2022-08-31)

Feature (CLI, Web): Coverage analysis

Feature (Web): Display machine-readable dataset names

Feat (Web): Compact results table

Fix (Web): Crashes when using filtering panel

Commit history

Contributors

2.4.0

Nextclade CLI 2.4.0, Nextclade Web 2.4.0 (2022-08-02)

Fix (Web): use indices to identify sequences uniquely in Nextclade Web

Feature (Web): warn about duplicate sequence names

Feature (CLI): add "download dataset and run" shortcut"

Feature (Web): Upgrade Auspice from version 2.37.2 to 2.37.3

Commit history

2.3.1

Nextclade CLI 2.3.1, Nextclade Web 2.3.1

Commit history

Contributors

2.3.0

Nextclade CLI 2.3.0, Nextclade Web 2.3.0

Commit history

Contributors

Add `index` column to CSV and TSV outputs

The `dataset-name` URL parameter is now properly applied in Nextclade Web, even when `input-fasta` is not provided