Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mutations relative to arbitrary node #1454

Merged
merged 74 commits into from
Jun 30, 2024

Conversation

ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov commented May 14, 2024

Resolves #991 #1237 #1142

This extends concept of private mutations (private mutations are mutations relative to the parent node on the ref tree) to a more general concept of mutations relative to an arbitrary node of interest.

The ref nodes of interest are described by the user in the .meta.extensions.nextclade.reference_nodes of the input Auspice JSON. The description can also contain constraints: we can match node to only query samples belonging to a certain clade or lineage.

Private mutations functionality is unchanged. New functionality, inputs and outputs are added on top. Though the implementation algo is largely reused.

Test

PR in data for testing: nextstrain/nextclade_data#198 (branch with the same name):

Work items

  • read input config from Auspice JSON
  • calculate relative nuc mutations
  • calculate relative aa mutations
  • filter by clade and clade-like attributes
  • output to Nextclade JSON
  • output to Nextclade NDJSON
  • output to TSV and CSV
  • pass required data between js and wasm
  • display in web app

For consideration:

  • ? use regex for matching clade-like values, rather than string equality
  • ? filter by gene

Inputs

Example configuration object. Put it into .meta of Auspice JSON (such that it becomes .meta.extensions.nextclade.reference_nodes)

Click to expand
{
  "extensions": {
    "nextclade": {
      "reference_nodes": [
        {
          "name": "NODE_0000659",
          "displayName": "BA.2.86 (23I)",
          "description": "Ancestral BA.2.86 sequence"
        },
        {
          "name": "XBB.1.5",
          "displayName": "XBB.1.5 (23A)",
          "description": "Ancestral XBB.1.5 sequence. Vaccine strain 2023/2024",
          "include": {
            "clade": ["23A"]
          }
        },
        {
          "name": "NODE_0000862",
          "displayName": "BA.5 (22B)",
          "description": "Ancestral BA.5 sequence. Vaccine strain 2022/2023",
          "include": {
            "clade": ["22B"]
          }
        }
      ]
    }
  }
}
  • The name field should match the name field of one of the nodes on the tree.

  • The displayName and description are optional arbitrary strings used for display purposes.

  • The include field should be an object, which contains:

    • keys: names from the .meta.extensions.nextclade.clade_node_attrs (for clade-like attributes) or string "clade" (for the built-in clades).
    • values: a list of values of the clade-like attribute or a list of built-in clades. Only query sequences which match these attributes are considered for calculation of mutations relative to that node.

    If the include field is not present, then no constraints applied (all query sequences are considered).

Outputs

Output JSON and NDJSON

Example fragment of output json entry (entry in the .results[] array) (mutation lists are truncated for demonstration purposes)

Click to expand
{
  "relativeNucMutations": [
    {
      "refNode": {
        "name": "NODE_0000659",
        "displayName": "BA.2.86 (23I)",
        "description": "Ancestral BA.2.86 sequence"
      },
      "muts": {
        "privateSubstitutions": [
          {"pos": 404, "refNuc": "A", "qryNuc": "G"},
          {"pos": 896, "refNuc": "A", "qryNuc": "C"}
        ],
        "privateDeletions": [],
        "reversionSubstitutions": [
          {"pos": 896, "refNuc": "A", "qryNuc": "C"},
          {"pos": 3430, "refNuc": "T", "qryNuc": "G"}
        ],
        "labeledSubstitutions": [
          {
            "substitution": {"pos": 404, "refNuc": "A", "qryNuc": "G"},
            "labels": ["23A", "23D", "23F", "23B", "22F", "23E", "23H", "23G"]
          },
          {
            "substitution": {"pos": 2333, "refNuc": "C", "qryNuc": "T"},
            "labels": ["23F", "23H"]
          }
        ],
        "unlabeledSubstitutions": [
          {"pos": 4089, "refNuc": "C", "qryNuc": "T"},
          {"pos": 11344, "refNuc": "C", "qryNuc": "T"}
        ],
        "totalPrivateSubstitutions": 75,
        "totalPrivateDeletions": 0,
        "totalReversionSubstitutions": 37,
        "totalLabeledSubstitutions": 34,
        "totalUnlabeledSubstitutions": 4
      }
    }
  ],
  "relativeAaMutations": [
    {
      "refNode": {
        "name": "NODE_0000659",
        "displayName": "BA.2.86 (23I)",
        "description": "Ancestral BA.2.86 sequence"
      },
      "muts": {
        "E": {
          "privateSubstitutions": [
            {"cdsName": "E", "pos": 10, "refAa": "T", "qryAa": "A"}
          ],
          "privateDeletions": [],
          "reversionSubstitutions": [],
          "totalPrivateSubstitutions": 1,
          "totalPrivateDeletions": 0,
          "totalReversionSubstitutions": 0
        },
        "S": {
          "privateSubstitutions": [
            {"cdsName": "S", "pos": 20, "refAa": "T", "qryAa": "R"},
            {"cdsName": "S", "pos": 26, "refAa": "-", "qryAa": "S"}
          ],
          "privateDeletions": [
            {"cdsName": "S", "pos": 23, "refAa": "S"},
            {"cdsName": "S", "pos": 143, "refAa": "Y"}
          ],
          "reversionSubstitutions": [
            {"cdsName": "S", "pos": 20, "refAa": "T", "qryAa": "R"},
            {"cdsName": "S", "pos": 49, "refAa": "L", "qryAa": "S"}
          ],
          "totalPrivateSubstitutions": 39,
          "totalPrivateDeletions": 2,
          "totalReversionSubstitutions": 25
        }
      }
    }
  ]
}

Output TSV and CSV

New columns are added. Currently, 4 columns added per each additional node:

relativeMutations['{node_name}'].substitutions
relativeMutations['{node_name}'].deletions
relativeMutations['{node_name}'].aaSubstitutions
relativeMutations['{node_name}'].aaDeletions

where {node_name} is the display_name if present, or name if not.

Mutations are in the same format as in the other columns containing mutations.

Visualization in Nextclade Web

  • a new dropdown "Relative to" in the table heading allows to select among "Reference", "Parent" (for private mutations, .i.e. relative to the parent node on the tree) or any of the nodes of interest defined in the tree. Sequence views will then switch to showing mutations relative to the selected node.

  • "Mut" column now shows mutations (count and fill list in the tooltip) only relative to the currently selected node

002

This extends concept of private mutations (relative to the parent node on the ref tree) to mutations relative to an arbitrary node of interest.

The ref nodes of interest are described by the user in the `.meta .extensions .nextclade .reference_nodes` of the input Auspice JSON. The description can also contain constrains: we can match node to only query samples belonging to a certain clade or lineage.

Private mutations functionality is unchanged, this is only an addition. Though the implementation algo is largely reused.

On this commit only nuc mutations are added.
Similarly to b537132, add relative amino acid mutations
This just passes through from js to wasm the data that is now required to output relative mutations
Copy link

vercel bot commented May 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
nextclade ✅ Ready (Inspect) Visit Preview Jun 28, 2024 2:25pm

ivan-aksamentov added a commit that referenced this pull request Jun 6, 2024
In preparation for #1454

Among other things, in order to render private and relative aa mutations we need to group them similarly to how absolute aa mutations are grouped. This involves finding adjacent mutations and nuc context for these mutations.

Sadly, the code for grouping is quite complex and is not immediately reusable (it combines aa mutation search and grouping at the same time), so let's do some refactoring.

Let's start from factoring away structs and functions that will be unchanged, to clear up some space for the action, to minimize diffs  and to reduce scrolling.
This should allow us to reduce nesting and to allow fallible operations in the map
After refactoring, aa_changes_group() is generic enough to be able to reuse it for private aa mutations too.

This is a first working sketch. The inputs are likely wrong and fixes will follow.
This adds a search for ancestor node with the same clade as the query sample. Let's call that "clade founder" node. Similarly this adds the same search but for custom clade-like attributes.

After the code is found we compute nuc and aa mutations relative to it.

This allows to avoid explicit repetitive ref node configuration in the ref tree.

The idea is similar to nearest mode/private mutations, and can be considered continuation of it: we start searching from the nearest node and ascend the tree towards the root, until we find a node which is closest to the root and having the same clade or clade-like attribute.
Is `skipAsReference` is set on a clade-like node attribute description (an entry in the Auspice JSON `.meta.extensions.nextclade.clade_node_attrs[]` array), then this attribute will not participate in clade founder node search as well as mutation calling relative to these nodes.
@ivan-aksamentov ivan-aksamentov merged commit e097715 into master Jun 30, 2024
19 of 20 checks passed
@ivan-aksamentov ivan-aksamentov deleted the feat/mutations-relative-to-node branch June 30, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Relative sequence view that shows only shows private mutations
3 participants