-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mutations relative to arbitrary node (extended) #1492
feat: mutations relative to arbitrary node (extended) #1492
Conversation
When new nodes are placed onto a ref node, sometimes we create an intermediate new node as a copy of the existing ref node. They are all named the same - `{node_key}_internal`. This is insufficient to provide uniqueness when multiple new nodes are placed onto the same ref node. To address that, here I change names to also include query name and index.
…om permissions-policy Having both `feature-policy` and `permissions-policy` causes a "yellow error" error/warning (warnor? errning?) because it's [deprecated](https://scotthelme.co.uk/goodbye-feature-policy-and-hello-permissions-policy/) (like if noone is using old browsers which don't know about it) Same for `interest-cohort` entry in `permissions-policy`which requires a feature flag enabled in chrome to be functional: > Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'. Let's remove the outdated `feature-policy` header and remove `interest-cohort` entry from `permissions-policy`. Still an "A" score on both - https://securityheaders.com/?q=clades.nextstrain.org&followRedirects=on - https://observatory.mozilla.org/analyze/clades.nextstrain.org even though we allow `unsafe-eval` in order to be able to run wasm. Security headers being a bit of a confusing mess nowadays (to put it lightly), which is always good for additional feeling of security of course! I am not even sure why I am doing this anymore. To get an "A" score I guess.
The bug is the inverted `.filter()` which filtered out all non-empty entries
Bumps [auspice](https://github.com/nextstrain/auspice) from 2.54.3 to 2.55.0. - [Release notes](https://github.com/nextstrain/auspice/releases) - [Changelog](https://github.com/nextstrain/auspice/blob/master/CHANGELOG.md) - [Commits](nextstrain/auspice@v2.54.3...v2.55.0) --- updated-dependencies: - dependency-name: auspice dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>
…ages/nextclade-web/auspice-2.55.0
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
After discussion with Richard, I added the "hardcoded" (built-in) search for clade founder nodes and calling mutations relative to these nodes.
This allows to avoid explicit repetitive ref node configuration in the ref tree. This search is the same as manually adding a criterion like this for every clade:
The idea is similar to nearest node/private mutations, and can be considered continuation of it: we find nearest node in order to do tree placement and to call private mutations, and here we continue searching from the nearest node and we ascend on the tree towards the root, until we find a node which is closest to the root and which is having the same clade or clade-like attribute as query. In TSV this information is emitted into columns:
where P.S. This is a prototype, the names are placeholders. The names and everything else is up for discussion of course. |
/// Starting from a given node, traverse the graph backwards (against direction of edges) until reaching the root, | ||
/// and return the last node which fulfills a given predicate condition, if any | ||
pub fn graph_find_backwards_last<N, E, D, F, R>( | ||
graph: &Graph<N, E, D>, | ||
start: GraphNodeKey, | ||
mut predicate: F, | ||
) -> Result<Option<R>, Report> | ||
where | ||
N: GraphNode, | ||
E: GraphEdge, | ||
F: FnMut(&Node<N>) -> Option<R>, | ||
{ | ||
let mut current = graph | ||
.get_node(start) | ||
.wrap_err("In graph_search_backwards(): When retrieving starting node")?; | ||
|
||
let mut found = None; | ||
|
||
loop { | ||
let edge_keys = current.inbound(); | ||
if edge_keys.is_empty() { | ||
break; | ||
} | ||
|
||
let edge = take_exactly_one(edge_keys) | ||
.wrap_err("In graph_search_backwards(): multiple parent nodes are not currently supported")?; | ||
|
||
let parent_key = graph.get_edge(*edge)?.source(); | ||
let parent = graph | ||
.get_node(parent_key) | ||
.wrap_err("In graph_search_backwards(): When retrieving parent node")?; | ||
|
||
let result = predicate(parent); | ||
if let Some(result) = result { | ||
found = Some(result); | ||
} | ||
|
||
current = parent; | ||
} | ||
|
||
Ok(found) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code which performs "ancestor-earliest" kind of search in the graph
Is `skipAsReference` is set on a clade-like node attribute description (an entry in the Auspice JSON `.meta.extensions.nextclade.clade_node_attrs[]` array), then this attribute will not participate in clade founder node search as well as mutation calling relative to these nodes.
Clade-like attributes can now be conditionally excluded from founder node search: And the datasets on the sibling branch in data repo are configured like so: |
@ivan-aksamentov This has gotten really fancy quickly, so I haven't been able to test out the various combinations of input definitions available. I did check the subclade-specific amino acid substitutions for recent H3N2 HA sequences using this branch's deployment of the web UI. I compared these subclade-specific annotations to the "derived haplotypes" we currently produce for seasonal flu trees and everything matched the way I expected. This is exactly the main use I have for this kind of relative mutations functionality, so I'd be super happy to have this merged. Also, thank you for getting this alternate "hardcoded" functionality implemented so quickly! |
Adds a prototype script that produces derived haplotype strings per record from a given Nextclade annotations file with columns for clade and mutations relative to each clade. The derived haplotypes produced with this script could eventually replace the haplotypes we build from the mutation-annotated trees and allow us to calculate haplotype frequencies from all available data instead of a subset of data used to build a tree. Related to #130 Depends on nextstrain/nextclade#1492
@huddlej Thanks for testing!
I now see here that it's Let me know if you have any further suggestions. Richard decided to test this branch a little bit more, so there's still time. |
That's right! The |
ef5ca4a
into
feat/mutations-relative-to-node
Extension of #1454 (based on top of its branch)
Resolves #991 #1237 #1142
This PR includes the following changes:
"relativeMutations['{name}'].nodeName"
column, to identify matched node name for each row (they now can vary across samples)Here I mostly describe changes to the node search. The mutation calling code is largely the same as in #1454 and it is happening for each node that has been found.
Inputs
The extensions JSON snippet is to be placed under
.meta.extensions.nextclade.ref_nodes
as before.Properties:
default
: string, optional. Set default search to display in the web app dropdown. Should correspond to one of thesearch[].name
fields or one of the special values__root__
for reference sequence (default),__parent__
for nearest node (private mutations).search
: array of objects, optional. Each object describes one search. Each search corresponds to an entry in the "Relative to" dropdown in the web app and a set of CSV/TSV columnsrelativeMutations['searchName']
. Note that these names no longer need to correspond to node names.search[].name
: required unique identifier of the search entrysearch[].displayName
,search.description
: optional friendly name and description to be displayed in the UI (dropdown)search[].criteria
: array of objects, optional. One or multiple search criteria. Criteria should be described such that during search run only one criterion matches a pair of query and node. If there are multiple matches, then one (unspecified) match is taken and a warning is emitted.search[].criteria[].qry
: object, optional, describing properties of query samples to select for this searchsearch[].criteria[].qry.clade
: array of strings, optional. Query names to consider for this search. At least one match is necessary for sample to match.search[].criteria[].qry.cladeNodeAttrs
: optional mapping from name of the clade-like attr to a list of searched values for this attr. At least one match is necessary for sample to match.search[].criteria[].node
: object, optional, describing properties of ref node to search, as well as search algorithm. All of the properties should match.search[].criteria[].node.name
: array of strings, optional. Searched node names. At least one match.search[].criteria[].node.clade
: array of strings, optional. Searched node clades. At least one match is necessary for node to match.search[].criteria[].node.cladeNodeAttrs
: optional mapping from name of the clade-like attr to a list of searched values for this attr. At least one match is necessary for node to match.search[].criteria[].node.searchAlgo
: string, optional. Search algorithm to usefull
(default): simple loop over all nodes until first match is foundancestor-earliest
: start with the current sample and traverse the graph against edge directions, looking for matching nodes, until it reaches root node. The result is the last encountered matching node.ancestor-latest
: start with the current sample and traverse the graph against edge directions, looking for matching nodes. The first match is the result.Examples
The branch with the same name in data: nextstrain/nextclade_data#212. This allows to use this data when adding URL param shortcut
?dataset-server=gh
to this PR's deployment of Nextclade Web.Script to embed configs into trees:
scripts/migrate_006_add_rel_muts.py
. Feel free to add/remove/modify entries on the branch. The changes will be reflected in the linked examples(looks like the script might be breaking clade-like attrs; to be investigated)
Example config snippets
Reproduce existing functionality of feat: mutations relative to arbitrary node #1454: search for a specific node by name, among all nodes (not necessarily ancestral) for all query samples
Click to expand
Reproduce existing functionality of feat: mutations relative to arbitrary node #1454: search for a specific node by name, among all nodes (not necessarily ancestral) for query samples matching certain criteria (in this case same clade)
This requires some work during preparing of the dataset: need to find exact node name for each criterion, and if node names change, then adjust the search descriptions accordingly.
Click to expand
New functionality: filter query samples by clade and find the earliest ancestor node with the same clade (note the
"searchAlgo": "ancestor-earliest"
).This is similar to the previous example, but no need to search node names in advance. Ancestral search is happening in Nextclade. Desired clades still need to be listed though.
For this particular use-case - finding clade founder for each clade - we may consider embedding it into Nextclade without need for search descriptions.
Click to expand
You might come up with more examples and use-cases. The query-to-node mapping is now many-to-many, which allows for great flexibility. And search algos allow to traverse the tree differently.
Possible improvements