-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mutations relative to arbitrary node #1454
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This extends concept of private mutations (relative to the parent node on the ref tree) to mutations relative to an arbitrary node of interest. The ref nodes of interest are described by the user in the `.meta .extensions .nextclade .reference_nodes` of the input Auspice JSON. The description can also contain constrains: we can match node to only query samples belonging to a certain clade or lineage. Private mutations functionality is unchanged, this is only an addition. Though the implementation algo is largely reused. On this commit only nuc mutations are added.
Similarly to b537132, add relative amino acid mutations
This just passes through from js to wasm the data that is now required to output relative mutations
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
ivan-aksamentov
commented
May 14, 2024
ivan-aksamentov
commented
May 14, 2024
ivan-aksamentov
added a commit
to nextstrain/nextclade_data
that referenced
this pull request
May 14, 2024
ivan-aksamentov
added a commit
to nextstrain/nextclade_data
that referenced
this pull request
May 14, 2024
ivan-aksamentov
added a commit
that referenced
this pull request
Jun 6, 2024
In preparation for #1454 Among other things, in order to render private and relative aa mutations we need to group them similarly to how absolute aa mutations are grouped. This involves finding adjacent mutations and nuc context for these mutations. Sadly, the code for grouping is quite complex and is not immediately reusable (it combines aa mutation search and grouping at the same time), so let's do some refactoring. Let's start from factoring away structs and functions that will be unchanged, to clear up some space for the action, to minimize diffs and to reduce scrolling.
This should allow us to reduce nesting and to allow fallible operations in the map
After refactoring, aa_changes_group() is generic enough to be able to reuse it for private aa mutations too. This is a first working sketch. The inputs are likely wrong and fixes will follow.
10 tasks
This adds a search for ancestor node with the same clade as the query sample. Let's call that "clade founder" node. Similarly this adds the same search but for custom clade-like attributes. After the code is found we compute nuc and aa mutations relative to it. This allows to avoid explicit repetitive ref node configuration in the ref tree. The idea is similar to nearest mode/private mutations, and can be considered continuation of it: we start searching from the nearest node and ascend the tree towards the root, until we find a node which is closest to the root and having the same clade or clade-like attribute.
Is `skipAsReference` is set on a clade-like node attribute description (an entry in the Auspice JSON `.meta.extensions.nextclade.clade_node_attrs[]` array), then this attribute will not participate in clade founder node search as well as mutation calling relative to these nodes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #991 #1237 #1142
This extends concept of private mutations (private mutations are mutations relative to the parent node on the ref tree) to a more general concept of mutations relative to an arbitrary node of interest.
The ref nodes of interest are described by the user in the
.meta.extensions.nextclade.reference_nodes
of the input Auspice JSON. The description can also contain constraints: we can match node to only query samples belonging to a certain clade or lineage.Private mutations functionality is unchanged. New functionality, inputs and outputs are added on top. Though the implementation algo is largely reused.
Test
PR in data for testing: nextstrain/nextclade_data#198 (branch with the same name):
Work items
For consideration:
Inputs
Example configuration object. Put it into
.meta
of Auspice JSON (such that it becomes.meta.extensions.nextclade.reference_nodes
)Click to expand
The
name
field should match thename
field of one of the nodes on the tree.The
displayName
anddescription
are optional arbitrary strings used for display purposes.The
include
field should be an object, which contains:name
s from the.meta.extensions.nextclade.clade_node_attrs
(for clade-like attributes) or string"clade"
(for the built-in clades).If the
include
field is not present, then no constraints applied (all query sequences are considered).Outputs
Output JSON and NDJSON
Example fragment of output json entry (entry in the
.results[]
array) (mutation lists are truncated for demonstration purposes)Click to expand
Output TSV and CSV
New columns are added. Currently, 4 columns added per each additional node:
where
{node_name}
is thedisplay_name
if present, orname
if not.Mutations are in the same format as in the other columns containing mutations.
Visualization in Nextclade Web
a new dropdown "Relative to" in the table heading allows to select among "Reference", "Parent" (for private mutations, .i.e. relative to the parent node on the tree) or any of the nodes of interest defined in the tree. Sequence views will then switch to showing mutations relative to the selected node.
"Mut" column now shows mutations (count and fill list in the tooltip) only relative to the currently selected node