Releases: nextstrain/nextclade
2.10.0
Nextclade Web 2.10.0, Nextclade CLI 2.10.0 (2023-01-24)
Add motifs search
Nextclade datasets can now be configured to search for motifs in the translated sequences, given a regular expression.
At the same time, we released new versions of the following Influenza datasets, which use this feature to detect glycosylation motifs:
- Influenza A H1N1pdm HA (flu_h1n1pdm_ha), with reference MW626062
- Influenza A H3N2 HA (flu_h3n2_ha), with reference EPI1857216
If you run the analysis with the latest version of these datasets, you can find the results in the glycosylaiton
column or field of output files or in "Glyc." column in Nextclade Web.
If you want to configure your own datasets for motifs search, see an example configuration in the aaMotifs
property of virus_properties.json
of these datasets: link.
Allow to chose columns written into CSV and TSV outputs
You can now select a subset of columns to be included into CSV and TSV output files of Nextclade Web (available in the "Download" dialog) and Nextclade CLI (available with --output-csv
and --output-tsv
). You can either chose individual columns or categories of related columns.
In Nextclade Web, in the "Download" dialog, click "Configure columns", then check or uncheck columns or categories you want to keep. Note that this configuration persists across different Nextclade runs.
In Nextclade CLI, use --output-columns-selection
flag. This flag accepts a comma-separated list of column names and/or column category names. Individual columns and categories can be mixed together. You can find a list of column names in the full output file. The following categories are currently available: all, general, ref-muts, priv-muts, errs-warns, qc, primers, dynamic. Another way to receive both lists is to add a non-existent or misspelled name to the list. The error message will then display all possible columns and categories.
Note that because of this feature the order of columns might be different compared to previous versions of Nextclade.
Add URL parameter for running analysis of example sequences
You can now launch the analysis of example sequences (as provided by the dataset) in Nextclade Web, by using the special keyword example
in the input-fasta
URL parameter. For example, navigating to this URL will run the analysis of example SARS-CoV-2 sequences (same as choosing "SARS-CoV-2" and then clicking "Load example" in the UI):
https://clades.nextstrain.org/?dataset-name=sars-cov-2&input-fasta=example
This could useful for example for testing new datasets:
https://clades.nextstrain.org/?dataset-url=http://example.com/my-dataset-dir&input-fasta=example
Add index
column to CSV and TSV outputs
The index
field is already present in other output formats. In this version CSV and TSV output files gain index
column as well, which contains the index (integer signifying location) of a corresponding record in the input fasta file or files. Note that this is not the same as row index, because CSV/TSV rows can be emitted in an unspecified order in Nextclade CLI (but this can be changed with --in-order
flag; which is set by default in Nextclade Web).
Note that sequence names (seqName
column) are not guaranteed to be unique (and in practice are not unique very often). So indices is the only way to reliably link together inputs and outputs.
2.9.1
Nextclade Web 2.9.1, Nextclade CLI 2.9.1 (2022-12-09)
Set default weights in "private mutations" QC check to 1
This fixes the bug when the QC score is 0 (good) when the following QC fields are missing from qc.json
:
.privateMutations.weightLabeledSubstitutions
.privateMutations.weightReversionSubstitutions
.privateMutations.weightUnlabeledSubstitutions
In this case Nextclade assumed value of 0, which lead to QC score of 0 always. Not all datasets were adjusted for the new qc.json
format in time and some had these fields missing - notably the flu datasets. So these datasets were erroneously showing perfect QC score for the "private mutations" rule.
In this version we set these weights to 1.0 if they are missing, which fixes the incorrect QC scores. Some of the sequences will now correctly show worse QC scores.
Fix dataset selector in Nextclade Web when there are datasets with the same name, but different reference sequences
The dataset selector on the main page on nextclade Web did not allow selecting datasets with the same name, but different reference sequences. This did not affect users so far, but we are about to release new Influenza datasets, which were affected. In this version we resolve the problem by keeping track of datasets not just by name, but by a combination of all attribute values (the .attributes[]
entries in the datasets index JSON file).
Ensure non-default references in "dataset list" command of Nextclade CLI are shown
This introduces special value all
for --reference
argument of nextclade dataset list
command. And it is now set as default. When it's in force, datasets with all reference sequences are included into the displayed list. This resolves the problem where non-default references are not show in the list.
Internal changes
- We are now submitting PRs to bioconda automatically, which should reduce the delay of updates there
Commit history
(click to expand)
Instructions
📥 Nextclade CLI & Nextalign CLI can be downloaded from the links in the "Assets" section just below. There click "Show all" to show more options. Note the difference between "nextalign" and "nextclade" files.
🌐 Nextclade Web is available at https://clades.nextstrain.org
🐋 Docker images are available at DockerHub
📚 To understand how it all works, make sure to read the Documentation
2.9.0
Nextclade Web 2.9.0, Nextclade CLI 2.9.0 (2022-12-06)
Increase requirements for supported Linux distributions for GNU flavor of Nextclade CLI
Due to malfunction of package repositories of Debian 7, we had to switch automated builds of the "gnu" flavor of Nextclade CLI from Debian 7 to CentOS 7. This increases minimum required version of glibc to 2.17. The list or Linux distributions we tested the new version of Nextclade on is here. For users of older Linux distributions (with glibc < 2.17) we suggest to use "musl" flavor of Nextclade CLI, which does not depend on glibc, but might be substantially slower. Users of Nextclade CLI on macOS and Windows and users of Nextclade Web are not affected.
Add gene length validation in GFF3 parser
Nextclade will now check if genes have length divisible by 3 in gene maps and will fail with an error if it's not the case.
Fix translated (internationalized) strings in Nextclade Web
We fixed missing spaces between words in some of the languages and fixes some of the translations.
Internal changes
- build Linux binaries on CentOS 7
- migrate CI to GitHub Actions
- upgrade Rust to 1.65.0
Commit history
(click to expand)
Instructions
📥 Nextclade CLI & Nextalign CLI can be downloaded from the links in the "Assets" section just below. There click "Show all" to show more options. Note the difference between "nextalign" and "nextclade" files.
🌐 Nextclade Web is available at https://clades.nextstrain.org
🐋 Docker images are available at DockerHub
📚 To understand how it all works, make sure to read the Documentation
2.8.0
Nextclade Web 2.8.0, Nextclade CLI 2.8.0 (2022-10-20)
Community datasets in Nextclade Web
This release adds support for fetching custom datasets from a remote location. This can be used for testing datasets introducing support for new pathogens, as well as for sharing these datasets with the community.
For that, we added the dataset-url
URL query parameter, where you can specify either a direct URL to the directory of your custom dataset:
https://clades.nextstrain.org?dataset-url=http://example.com/path/to/dataset-dir
or a URL to a GitHub repository:
https://clades.nextstrain.org?dataset-url=https://github.com/my-name/my-repo/tree/my-branch/path/to/dataset-dir
or a special shortcut to a GitHub repository:
https://clades.nextstrain.org?dataset-url=github:my-name/my-repo@my-branch@/path/to/dataset-dir
If a branch name is not specified, the default branch name is queried from GitHub. If a path is omitted, then the files are fetched from the root of the repository.
When dataset-url
parameter is specified, instead of loading a list of default datasets, a single custom dataset is loaded from the provided address. Note that this should be publicly accessible and have CORS enabled. GitHub public repositories already comply with these requirements, so if you are using a GitHub URL or a shortcut, then no additional action is needed.
For more information, refer to:
Compression of all input and output files in Nextclade CLI and Nextalign CLI
Previously, only FASTA files could be compressed and decompressed on the fly. Now Nextclade CLI and Nextalign CLI can read all input and write all output files in compressed formats. Simply add one of the supported file extensions: "gz", "bz2", "xz", "zstd", and the files will be compressed or decompressed transparently.
Decrease default number of threads in Nextclade Web
Some users have observed long startup times of the analysis in Nextclade Web. In this release we decreased the default number of processing threads from 8 to 3, such that startup time is now a little faster.
If you want to speedup the analysis of large batches of sequences, at the expense of longer startup time, you can tune the number of threads in the "Settings" dialog.
Improve readability of text fragments in Nextclade Web
We've made text paragraphs on main page and some other places a little prettier and hopefully more readable.
Fix crash when reading large, highly-nested tree files
We improved handling of Auspice JSON format, such that it no longer crashes when large trees and trees with large number of deep branches are provided.
Commit history
(click to expand)
- [
7438805
] chore(deps): bump auspice in /packages_rs/nextclade-web
Bumps auspice from 2.38.0 to 2.39.0.
updated-dependencies:
- dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] [email protected]
Update clades.svg
-
[
77bcc03
] Merge pull request #1012 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.39.0 -
[
e6aab13
] chore: release web v2.7.1 -
[
3de3205
] chore: disable core dumps when running dev docker container [skip ci] -
[
6cb95a4
] refactor: minify clades.svg -
[
02ea73a
] feat(web): prettify markdown rendering, changelog window text
Applies small visual fixes and tweaks to make markdown text a little more readable.
Highlights headings in the changelog and fixes a few visual bugs in code snippets.
Removes unnecessary and unused styles.
-
[
63f6d21
] Merge pull request #1016 from nextstrain/refactor/web-minify-clades-svg -
[
82b93f2
] Merge pull request #1017 from nextstrain/feat/web-prettier-markdown -
[
2539c25
] fix: ensure parsing of large, highly-nested tree JSONs
Resolves #1018
When tree JSON is sufficiently nested, serde_json
fails to parse it into the AuspiceTree
struct
with:
recursion limit exceeded at line <line> column <column>
This problem has been reported before in serde_json
in serde-rs/json#334, and the solution allowing to remove the recursion limit was implemented in the form of Deserializer::disable_recursion_limit
traded off to the increased risk of overflowing the call stack.
So here I implemented the suggested solution, by calling the disable_recursion_limit()
on deserializer. I am also mitigating the risk of stack overflow by using growing stack adapter from serde_stacker
.
This allows to successfully parse the large tree from #1018, although probably also makes the tree parsing a bit slower. But it is not critical for overall runtime, as it is a one-off operation.
-
[
7c437b0
] chore: remove unrelated changes [skip ci] -
[
6ed348b
] refactor: deduplicate json parsing code -
[
33fd032
] feat(web): dynamically set dataset server root using a URL param
Allows to dynamically set dataset server root URL in Nextclade Web on runtime.
It is sufficient to add URL parameter dataset-server
with the value containing URL to the root of the dataset file server (containing index_v2.json
), and Nextclade Web will fetch dataset index and all dataset files from this server instead of the default:
?dataset-server=http://example.com
The localhost URLs should also work:
?dataset-server=http://localhost:27722
This is equivalent to --server
flag in Nextclade CLI.
This should facilitate testing of custom datasets in Nextclade Web.
-
[
02b19a2
] chore: enable CORS for local dataset server -
[
064393e
] docs: document dataset-server URL param -
[
ed4b775
] feat(web): reduce default number of threads to speedup initialization
Despite "thread" (actually webworker) initialization is supposed to be concurrent, there seem to be a clear trend of increased initialization time as number of threads increased. This is likely due to data transfer overhead between workers.
Here I reduce number of threads that can be initialized with default settings.
But these setting can always be altered in the "Settings" dialog.
-
[
c4e7736
] Merge pull request #1020 from nextstrain/feat/web-dataset-server-url-param -
[
7890180
] Merge branch 'master' into fix/tree-json-parsing-recursion-limit -
[
ecdff00
] Merge pull request #1021 from nextstrain/feat/web-decrease-num-threads -
[
d677b95
] docs: add info about dataset-server URL param to dev guide [skip ci] -
[
e77141e
] Merge remote-tracking branch 'origin/master' into fix/tree-json-parsing-recursion-limit -
[[
62008ff
](https://github.com/nextstrain/nextclade/commit/62008ff877b6378e5e8a5ca06200d2044e9...
2.7.0
Nextclade CLI 2.7.0, Nextclade Web 2.7.0 (2022-10-05)
Hide custom clade columns
We added ability to mark certain custom clade columns as hidden. In this case they are not shown in Nextclade Web. This prepares the web application for the upcoming reorganization of clade columns. It should not affect current users.
Remove unused fields from output files, add custom phenotype key list
We removed extra repetitive fields related to custom phenotype columns (e.g. "Immune escape" and "ACE-2 binding") from JSON and NDJSON output files. We also added keys for custom phenotype columns to the header section of output JSON, for symmetry with custom clade columns. These changes should not affect most users.
Commit history
(click to expand)
-
[
21c5e5e
] chore: apply automatic clippy lint fixes -
[
d71864c
] chore: lint -
[
de4079a
] chore: format -
[
296e9f8
] Merge pull request #1003 from nextstrain/chore/lint -
[
362f280
] chore(ci): only release cli from release-cli branch -
[
8337dc0
] fix: link to release dataset changlog
Currently release users see master dataset changelog. Here I change the link such that they see dataset changelog on release branch, i.e. only released dataset changes.
-
[
95bda16
] Merge pull request #1004 from nextstrain/fix/web-dataset-changelog-link -
[
ccb35e9
] feat(web): format tooltip text
Format text of custom phenotype columns. This text comes from a JSON string, so formatting capabilities are very limited. Here I split the incoming string on \n
and create paragraphs (<p>
) from each fragment.
-
[
6f48b17
] Merge pull request #1005 from nextstrain/feat/web-format-tooltip-text -
[
f7d3ae8
] chore: release web v2.6.1 -
[
7f6cd6b
] feat: allow hiding custom clade columns -
[
8e7eb64
] fix: show custom clade columns by default
Let's flip the boolean, so that it's natural default is false
- [
54117cf
] fix: remove unused data from outputs
Removes results[].phenotypeValues.nameFriendly
and results[].phenotypeValues.description
from output JSON and NDJSON. These fields were not intended to be used, are repeated and increase file size needlessly.
- [
ee0c659
] fix: add descriptions for phenotype attributes to outputs
Adds descriptions of phenotype attributes to .phenotypeAttrKeys
of JSON output. This is for symmetry with the similarly implemented .cladeNodeAttrKeys
attributes.
-
[
f145b45
] Merge pull request #1009 from nextstrain/fix/remove-unused-output-data -
[
c8fe93e
] Merge pull request #1010 from nextstrain/fix/add-phenotype-descriptions-to-outputs -
[
d30d608
] fix: reset node attr state correctly across runs -
[
ade06e2
] Merge pull request #1006 from nextstrain/feat/allow-hiding-custom-clade-columns
feat: allow hiding custom clade columns
- [
1c3377b
] chore: speedup wasm dev build
Here for wasm dev build I replace the wasm-pack
which is nice, but behind its defaults it hides lots of useful stuff, and I call cargo build
, wasm-bindgen
and wasm-opt
explicitly. Notably wasm-pack
does not allow custom cargo profiles (rustwasm/wasm-pack#1111).
I also add the new "opt-dev" cargo profile, which is something in the middle between release
and dev
- a balance between rebuild speed and runtime performance. This should save lots of time during day-to-day work.
The production version still uses wasm-pack
and a more tried solution.
The dev version now requires cargo wasm toolchain, wasm-bindgen
and wasm-opt
(from binaryen
project) installed manually. Previously wasm-pack
would handle that. The dev docker image was updated accordingly.
- [
ae49768
] fix: adjust import
For the changes in 1c3377b
(there no longer is package.json
, so import of the directory does not work)
- [
4f5db09
] fix(web): crash when exporting files in web
Followup of #1006
The PR #1006 introduced a bug: it filters away hidden columns from recoil state, thus making them unavailable for export functions. The mismatch in expected columns caused crash on Rust side.
Here I modify the filtering logic, moving it to the actual rendering site, instead of filtering globally. This way, hidden columns are not rendered, but are still be available for export.
2.6.0
Nextclade Web 2.6.0, Nextclade CLI 2.6.0 (2022-09-27)
New metrics: Immune escape and ACE-2 binding
We added software support for the new custom metrics in Nextclade CLI and nextclade Web. The dataset "sars-cov-2-21L" with the data required for these metrics to appear will be released in the coming days. Stay tuned.
The dataset-name
URL parameter is now properly applied in Nextclade Web, even when input-fasta
is not provided
Previously dataset-name
URL parameter was ignored, unless input-fasta
is also set. Now the dataset-name
will make Nextclade Web to preselect the requested dataset, regardless of whether the input-fasta
URL parameter is also provided. This allows to create URLs which preconfigure Nextclade Web with a certain dataset and dataset customizations, with intent to provide fasta and to run manually.
Better error handling
We improved error handling, such that some of the errors in Nextclade Web now have better error messages. Some of the errors that previously caused hard crash in in Nextclade CLI are now handled more gracefully and with better error messages.
Internal changes
We upgraded Rust to 1.63.0 and Nextclade CLI and Nextalign CLI are now using std::thread::scope
for better multithreading support.
Commit history
(click to expand)
It is lonely in the old deprecated algorithms/
directory, let's put it at the root.
Remaining changes are just import path change (automated).
-
[
0583e91
] refactor: format -
[
4a17d36
] Merge pull request #981 from nextstrain/refactor/cleanup -
[
ab12755
] fix(web): correct text instructions for fasta text fields -
[
129f549
] Merge pull request #983 from nextstrain/fix/text-field-text -
[
7ea8414
] fix: consider different file encodings
This attempts to mitigate crash in web when the fasta file is not UTF-8 or ASCII encoded.
Here I am trying to deduce a list of encodings the file resembles and then attempting to decode the contents of the file from these encodings to a JS string, instead of simply using the FileReader
's .toString()
function.
-
[
992fc57
] refactor: lint -
[
6f4b5b4
] feat: try a more accurate chardet library -
[
b0f7bf2
] feat: rephrase error message -
[
18d8716
] Merge pull request #987 from nextstrain/fix/web-file-encoding -
[
f6bdaae
] refactor: lint, fix typings -
[
86e2ac1
] Merge pull request #988 from nextstrain/refactor/lint -
[
e8cce45
] feat: calculate escape -
[
b5c6b29
] feat: add escape to csv and tsv outputs -
[
add2a85
] feat(web): add escape column -
[
551e4e2
] feat: hide 'Escape' column when there aren't any values -
[
e4c02b0
] refactor: lint -
[
8694a52
] feat: avoid sending filenames to plausible -
[
f5cbfa2
] feat: improve error page and popup -
[
b2f3b41
] refactor: lint -
[
9650e27
] Merge pull request #992 from nextstrain/feat/web-improve-error-handling -
[
ccbcacf
] feat: extract translation strings -
[
ff265c1
] Merge branch 'feat/extract-translation-strings' -
[
628666a
] feat: setup automatic translation with AWS Translate -
[
d4e9bde
] Merge pull request #993 from nextstrain/feat/autotranslate -
[
a674e35
] feat: apply automatic translations -
[
b7a3ab1
] feat(web): add more languages -
[
14b98fc
] feat(web): translate more strings -
[
31bc2ef
] refactor: correct compiler error -
[
b495e7b
] feat(web): persist selected locale -
[
e9f63e1
] fix(web): sort locales -
[
f90856b
] Merge pull request #994 from nextstrain/feat/translate -
[
09ec70c
] chore(deps): bump auspice in /packages_rs/nextclade-web
Bumps auspice from 2.37.3 to 2.38.0.
updated-dependencies:
- dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] [email protected]
-
[
d3b35c3
] Merge pull request #996 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.38.0 -
[
cc79e84
] chore: release web v2.5.1 -
[
bf21f97
] Merge remote-tracking branch 'origin/master' into feat/escape -
[
44b1f9e
] fix: correct escape algorithm -
[
125ac16
] fix(web): ensure sorting results table by escape works -
[
ce1693a
] feat: add text to escape help tooltip -
[
3a502a9
] Add clarification on when escape scores are unreliable -
[
b00e6c3
] feat(web): make escape column tooltip wider -
[
9450041
] feat: allow optional name different from gene name for escape values -
[
ac1b14a
] feat(web): add tooltips for escape column -
[
3cd1596
] feat: skip escape calculation for outgroup samples -
[
e499dc7
] fix(web): sorting by escape -
[
2320822
] feat: list ignored clades in the virus json -
[
e06e142
] feat: allow for escape coefs per posiition only and per posiition per aa -
[
dfbf64b
] refactor: rename identifiers to clarify intent -
[
61e4fd7
] refactor: lint -
[
6d28d49
] feat: render phenotype values in separate columns
-...
2.5.0
Nextclade CLI 2.5.0, Nextclade Web 2.5.0 (2022-08-31)
Feature (CLI, Web): Coverage analysis
Nextclade now emits "coverage" metric which shows the portion of nucleotides in the alignment range being non-N and non-ambiguous, compared to the length of the reference sequence:
coverage = ((alignment_end - alignment_start) - total_missing - total_non_acgtns) / ref_len;
The metric is displayed as a percentage in the "Cov." column of Nextclade Web, and emitted into JSON, NDJSON, CSV and TSV outputs of Nextclade CLI and Web in the "coverage" field or column.
Feature (Web): Display machine-readable dataset names
Dataset selector on the main page of Nextclade Web now additionally shows machine-readable dataset name. This can help advanced Web users to put correct dataset name into the URL parameters, and CLI users to find the correct dataset name for downloads.
Feat (Web): Compact results table
We made some of the columns in results table of Nextclade Web narrower to make user experience a little better on laptops. When possible, for optimal experience, we still recommend to use 1080p displays or larger.
Fix (Web): Crashes when using filtering panel
Users reported intermittent crashes of Nextclade Web when entering values in the filtering panel on results page of Nextclade Web. This have been fixed now. If you stil have problems, please submit an issue in our GitHub repository.
Commit history
(click to expand)
-
[
b9a9a4b
] feat(web): show dataset machine-readable name in dataset selector -
[
2de9310
] Merge remote-tracking branch 'origin/master' into feat/web-dataset-machine-readable-name -
[
4bad667
] fix: correctly reset duplicate sequence name map on new runs
Currently duplicate sequence name data persists across unrelated runs of Nextclade Web, this causes incorrect reporting of duplicates.
This PR ensures that duplicate sequence name data does not persist across runs.
-
[
4f23954
] Merge pull request #950 from nextstrain/fix/web-dup-names-reset -
[
d4430e9
] docs: add changelog for web 2.4.0 -
[
4435faf
] chore: release web v2.4.1 -
[
1cd8df0
] feat: calculate coverage qc metric -
[
4c2e5ed
] feat: add coverage qc metric to csv and tsv outputs -
[
03cd444
] feat(web): add coverage metric to web UI -
[
e8b59f1
] feat(web): correct and restyle download links on main page
The currently deployed version links to non-existent v1 files.
I thought that removing extra explanations (found in the docs) and making buttons more prominent will make it easier for users to discover these links.
-
[
fab66aa
] feat: add text suggested by @emmahodcroft -
[
c16836e
] doc: update about text in web app -
[
418de94
] Merge pull request #960 from nextstrain/docs/update-about -
[
81b06c6
] Merge pull request #958 from nextstrain/feat/download-links -
[
7309d2c
] Make coverage qc more sensitive -
[
a11499c
] fix: wrong comment character -
[
caf3c9a
] feat: remove indels from coverage calculation
I'm not sure how handle Ns and ambiguous characters that appear in indels
The formula as is assumes missing
and non_acgtns
don't count bases
in indels
-
[
92b0c26
] feat: remove coverage qc metric, move it into analysis results -
[
2412f63
] feat: remove qc metric from web app, add dedicated column -
[
ff9767b
] feat: add coverage column -
[
44e90db
] feat: add ref length and total covered nucs into file outputs -
[
b503c1b
] docs: fix links to nextalign downloads -
[
6b71d9e
] docs: fix md syntax of a header -
[
18aa4e3
] Merge pull request #964 from nextstrain/docs/fixes -
[
a834ce1
] chore: add 'sphinx linkcheck' makefile target -
[
139c4a6
] fix: doc links -
[
fd63ac8
] Merge pull request #965 from nextstrain/fix_doc_links -
[
984efc8
] feat: add event analytics -
[
98c9076
] Merge pull request #967 from nextstrain/feat/events -
[
c51184e
] chore: release web v2.4.2 -
[
739e5ed
] Move coverage column after Ns -
[
e11a073
] Squeezing columns a bit -
[
1735cfd
] chore: precompress web root
AWS Cloudfront does not compress all files and we'd like to compress everything, including files larger than 10M and wasm for example. Here I compress files in advance with gzip and brotli and add a Lambda@Edge function which rewrites origin paths, depending on accept-encoding
header.
-
[
f6d167f
] chore: fix path -
[
a61c72f
] chore(ci): add missing deps -
[
5992d33
] chore(ci): ensure previously uploaded files are not erased -
[
c8356ab
] chore: update security headers -
[
5c8c8cf
] chore: fix typescript error -
[
16c741f
] chore: install missing dependency -
[
5530c69
] chore: update awscli -
[
ae0e6e5
] chore; set correct content type and encoding when deploying -
[
c3f5591
] fix: make filtering code safer
Related to: #961
This removes casts in amino acid and nucleotide filtering code which have a potential to cause crashes in certain cases. I made undefined
-comparison more lax, such that now it also checks for null
and added default values to anything that can be nil.
Even though I cannot reproduce the issue, this hopefully should fix it.
- [
d173af2
] fix: swap array and filters
Lodash intersectionWith()
expects filtered array to be the first argument
-
[
349a9e7
] chore(infra): fix .gz and .br rewrites -
[
b7e1e28
] chore(infra): redirect from results and tree pages to main page [skip ci]
They are not real pages, ...
2.4.0
Nextclade CLI 2.4.0, Nextclade Web 2.4.0 (2022-08-02)
Fix (Web): use indices to identify sequences uniquely in Nextclade Web
Previously, Nextclade used sequence names to identify sequences. However, sequence names proven to be unreliable - they are often duplicated. This caused various problems where results with the same names could have been overwritten.
Since this version, Nextclade Web is using sequence indices (order of sequences in the input file or files), to tell the sequences apart, uniquely. This should ensure correct handling of duplicate names. This change only affects results table in the Web application. CLI is not affected.
Feature (Web): warn about duplicate sequence names
Nextclade Web now reports duplicate sequence names. Duplicate sequence names often confuse bioinformatics tools, databases and bioinformaticians themselves, so we are trying to encourage the community to be more thoughtful about naming of their samples.
When duplicate names are detected during analysis in Nextclade Web, the "Sequence name" column of the results table now displays a yellow "duplicates" warning icon, and its tooltip contains a list of indices of sequences (serial numbers of the sequences in the input fasta file or files) having the same name.
Note that Nextclade compares only names, not sequence data themselves.
Feature (CLI): add "download dataset and run" shortcut"
In this version we added --dataset-name
(-d
) argument to run
command, which allows to download a dataset with default parameters and run with it immediately, all in one command.
For example this command.
nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta
or, the same, but shorter
nextclade run -O out -d sars-cov-2 sequences.fasta
will download the latest default SARS-CoV-2 dataset into memory and will run analysis with these dataset files. This is a convenience shortcut for the usual combination of nextclade dataset get
+ nextclade run
. The dataset is not persisted on disk and downloaded on every run.
Feature (Web): Upgrade Auspice from version 2.37.2 to 2.37.3
This release includes a routine upgrade of Auspice tree view. You can read the changelog in the Auspice GitHub repository
Commit history
(click to expand)
- [
5da4fe5
] chore(deps): bump auspice in /packages_rs/nextclade-web
Bumps auspice from 2.37.2 to 2.37.3.
updated-dependencies:
- dependency-name: auspice
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] [email protected]
-
[
5a9c699
] feat: assert mutation is inside the gene -
[
22e4e2d
] feat: add "download dataset and run" shortcut"
This adds --dataset-name
(-d
) to run
command, which allows to download a dataset with default parameters and run with it immediately, in one command.
For example this command.
nextclade run --output-all=out --dataset-name=sars-cov-2 sequences.fasta
will download the latest default sars-cov-2 into memory and will run analysis with these dataset files.
This is a convenience shortcut for the usual combination of dataset get
+ run
. The dataset is not persisted on disk and downloaded on every run.
- [
e155b94
] fix: use indices to identify sequences uniquely
Nextclade is using sequence names to uniquely identify sequences. However, sequence names come from user inputs and cannot be trusted to be unique. Neither it seems there is a consensus on uniqueness of sequence names in the bioinformatics community as a whole.
This causes various problems where sequence names are used as identifiers, and when for example there are multiple sequences with the same name. In particular, when storing analysis results, they are effectively stores in an associative container, where sequence name acts as a key. This leads to newer results overwriting older results as they arrive during analysis. Additionally, some of the HTML id
properties used sequence names to add uniqueness. This was leading to incorrect HTML being produces, with multiple elements having the same id
property.
In this PR I:
- change internal storage to use sequence indices in the input file(s) as keys
- add sequence index into HTML
id
s - display "Sequence index" in places where only sequence name was displayed previously
This should ensure correct handling of duplicated names.
This affects only web application. In the algorithmic and CLI parts, sequence names are not used - results are stored in the form of an array, and no HTML is involved.
-
[
c43367e
] Merge remote-tracking branch 'origin/master' into fix/web-unique-seq-ids -
[
e744ca6
] chore: release web v2.3.1 -
[
e2f37d7
] chore: fix CHANGELOG link to PR instead of issue -
[
2a88f30
] Merge pull request #938 from nextstrain/fix/crash-gene-overflow -
[
c92683a
] Merge pull request #939 from nextstrain/feat/dataset-get-and-run -
[
8a69ac7
] feat: remove sequence index from tooltips -
[
ec2104a
] Merge pull request #946 from nextstrain/fix/web-unique-seq-ids -
[
92a7234
] feat(web): report duplicate sequence names
This adds little yellow icons in the "Sequence name" column when a sequence has the same name as other sequences in the same run. Indices of these sequences are additionally listed in the tooltip.
-
[
a02b60c
] docs: add changelog for 2.4.0 -
[
a2c37ad
] Merge pull request #937 from nextstrain/dependabot/npm_and_yarn/packages_rs/nextclade-web/auspice-2.37.3
chore(deps): bump auspice from 2.37.2 to 2.37.3 in /packages_rs/nextclade-web
feat(web): report duplicate sequence names
2.3.1
Nextclade CLI 2.3.1, Nextclade Web 2.3.1
- Fix #947: In datasets where genes started right at the beginning of the reference sequence, Nextclade version 2.0.0 until 2.3.0 will crash due to underflow. This is now fixed. The only Nextclade provided dataset that was affected by this bug is Influenza Yamagate HA. That dataset had a further bug in the tree so there is now a corresponding dataset bug fix release available. (report: @mcroxen)
Commit history
(click to expand)
Uses dependabot to check for auspice updates.
- [
f07a807
] Delete Update-Auspice.yml
This is not needed with the new .github/dependabot.yml.
-
[
f4c1aa3
] docs: adjust cli docs to nextclade v2 -
[
572915b
] docs: Install nextstrain.sphinx.theme extension
The new extension has sphinx_copybutton pre-installed and configured. This change enables it here.
This also makes it easier to configure other extensions across multiple docs projects.
- [
23010b7
] Bump nextstrain-sphinx-theme to >=2022.5
This is when the extension started being actively configured.
docs: Install nextstrain.sphinx.theme extension
-
[
11f6fcb
] chore: add tmp data to gitignore -
[
2d8ab9d
] refactor: lint -
[
316862e
] fix: underflow when mutation in gene close to start
When the first gene is very close to the start of the sequence
fix: underflow when mutation in gene close to start
2.3.0
Nextclade CLI 2.3.0, Nextclade Web 2.3.0
This release brings back entries for failed sequences into output files.
It was reported by @tseemann (#921) that in Nextclade v2 CSV and TSV rows are not written for failed sequences. While in v1 they were. This was unintended.
In this release:
- CSV, TSV, NDJSON rows for failed entries are now also written (only
seqName
anderrors
columns are populated). Note, it's important to check forerrors
column and disregard other columns if there are errors. For example, in case of an error, thesubstitutions
column will be empty, but it does not mean that the failed sequence has no substitutions. - JSON output now has a separate
errors
field at the root of the object, with all failed entries - NDJSON rows are also written for failed entries. They only contain index, seqName and errors fields.
- new columns are written into CSV and TSV outputs: warnings and failedGenes, which include any warnings emitted for a sequence as well as a list of genes that failed translation. Now all columns of the "errors.csv" file are also in the CSV and TSV results files
We improved the warning that users of unsupported browsers (mostly Safari) receive when they browse to Nextclade web.
Further changes only relevant to those building Nextclade themselves:
- @xzhub contributed a PR (#930) to improve customization of the dataset server URL: when
DATA_FULL_DOMAIN
is started with/
, theHTTP Origin
will automatically be added in front of it to make an absolute url
Commit history
(click to expand)
-
[
0e7971e
] test(cli): scaffold functional cli tests with cram -
[
606fad8
] Merge pull request #920 from nextstrain/test/cli-func -
[
72695f6
] Update CHANGELOG.md -
[
bdfd9cf
] chore(ci): fix gnu linux build
Apparently Debian 7 has no pip3 package. So let's install it using get-pip script
-
[
6bf4e03
] docs(dev): add a note about CORS in dev guide -
[
52d0f50
] fix(cli): write failed sequence results (errors) to csv and tsv outputs
It was reported by @tseemann that In Nextclade v2 csv and tsv outputs are not written for failed sequences. While in v1 they were. This was unintended.
Here I bring back writing CSV and TSV rows for failed sequences (only "errors" column is populated)
- [
f9280c4
] feat(cli): add 'warnings' and 'failedGenes' columns to results csv/tsv
This adds new columns to main nextclade results CSV and TSV outputs: warnings
and failedGenes
, which include any warnings emitted for a sequence as well as a list of genes that failed translation.
Now all columns of the "errors.csv" file are also in the CSV and TSV results files.
- [
2fdf81a
] docs: fix typos
Noted in an email to our support address (ticket 510).
docs: fix typos
-
[
773b1e6
] chore: remove unnecessary variable -
[
7951530
] Merge remote-tracking branch 'origin/master' into fix/cli-csv-output-errors -
[
b0d41e6
] fix(cli): write failed sequences to json and ndjson outputs -
[
c1f9a72
] Revert "chore: upgrade Cargo.toml's"
This reverts commit 2290adc.
- [
55f3342
] Revert "chore: update dependencies, rust to 1.61"
This reverts commit 46a8a53.
-
[
381c3ab
] Merge pull request #925 from nextstrain/chore/revert-dep-upgrade -
[
5a5a060
] Merge remote-tracking branch 'origin/master' into fix/cli-csv-output-errors -
[
b33d343
] fix(web): add failed sequences to downloaded outputs in the web app -
[
01f5076
] fix(web): typo in download dialog -
[
7062ced
] fix(web): enable download button even if all sequences faield
Currently the button stays disabled when a ran ended in no successful results. However, this prevents downloading errors.csv and other files containing errors, which might still be useful.
-
[
e55f31b
] Merge pull request #926 from nextstrain/fix/web-typo -
[
da679ac
] feat(web): improve unsupported browser warning
This makes the warning to appear in a modal, so that it's harder to miss. Also adds links to official websites of Chrome and Firefox.
-
[
ab00f35
] fix: add failed sequences to insertions.csv of nextalign and nextclade -
[
f4d4480
] docs: document outputs on failures -
[
fe9bb44
] fix: make error entries in JSON and NDJSON outputs similar to normal -
[
cd9ed49
] Merge pull request #922 from nextstrain/fix/cli-csv-output-errors -
[
88238a8
] docs: document limitations of JSON outputs
Adds a few warnings explaining how JSON outputs can cause increased memory consumption.
⚠️ For CLI users: Note that due to technical limitations of the JSON format, it cannot be streamed entry-by entry, i.e. before writing the output to the file, all entries need to be accumulated in memory. If the JSON results output or tree output is requested (through --output-json, --output-tree or --output-all arguments), for large input data, it can cause very high memory consumption, disk swapping, decreased performance and crashes. Consider removing these outputs for large input data, running on a machine with more RAM, or processing data in smaller chunks.
-
[
5c73286
] docs: document output compression -
[
cb569e5
] docs: document how failed sequences are reflected in output files -
[
8986a95
] Merge pull request #929 from nextstrain/docs/outputs -
[
4f56bd9
] Let user use relative url for nextclade datasets server
when the DATA_FULL_DOMAIN is started with '/', the HTTP Origin will
be added automatically in front of it to make an absolute url
-
[
4ab84c8
] Replace var with let -
[
0200b6e
] Merge branch 'fix/web-enable-dl-btn' -
[
34fc40a
] Merge branch 'feat/web-improved-unsupported-browser-warning' -
[
bbaff7b
] feat(cli): add a note about CSV and TSV delimiters -
[
356c100
] Merge pull request #933 from nextstrain/feat/cli-help-csv-delimiters
feat(cli): add a note about CSV and TSV delimiters