Skip to content

Simulation

Art Poon edited this page Apr 10, 2018 · 10 revisions

Summary

Simulation for treeswithintrees (twt) occurs in two stages unless the user provides a transmission (host) tree as an input. The first stage generates the transmission (outer) tree, and the second stage generates the pathogen (inner) tree given the outer tree.

First, twt simulates the transmission (outer) tree given the numbers of sampled and unsampled infected hosts and uninfected hosts. Accounting for the unsampled and uninfected hosts is necessary to determine the rate that sampled infections converge to transmission events going back in time. Note that the transmission tree may contain singleton nodes where a branch of the transmission history switches from a sampled host to an unsampled source host.

Second, twt fixes the transmission tree and simulates the pathogen tree within it, with user-specified rates of migration and coalescence and transmission bottleneck sizes. Note that since we have fixed the transmission tree, it is no longer necessary to track unsampled and uninfected hosts.

In general, simulations proceed back in time and consist of fixed and random events. The timing of random events are determined by the rate parameters associated with each potential event, and we assume exponential waiting times (constant rates) to the next random event. Fixed events are predetermined and override random events -- in other words, if the waiting time to any random event is further back in time than the next fixed event, then we resolve that fixed event and restart the simulation from that time point.

Simulating population dynamics

Before we are able to simulate the "outer" (transmission) tree, we need to know how the sizes of the susceptible and infected populations for each CompartmentType change over time. These numbers need to be specified by the user for time zero (present time) as parameters of the simulation model. The user also provides the transmission rates (beta) between CompartmentTypes. In order to simulate the trajectories of population sizes back in time, we need to draw the waiting times to transmission events until we reach the first infection.

  1. Initialize S_i and I_i for each CompartmentType i.
  2. Calculate population-level transmission rates for every pair of CompartmentTypes i and j as \beta * (S_j+1) * I_i) where i is the source and j is the recipient CompartmentType index.
  3. Draw the waiting time to the next transmission event where the rate is the sum of all the above rates.
  4. Determine the CompartmentTypes between which the transmission occurs as the proportion of that rate to the total rate.
  5. Record the time of this event, the source and recipient.
  6. Increment S_j and decrement I_i.
  7. Update time.
  8. If there are still two or more infected Compartments, go to step (2) -- note it may be less computationally expensive to just update the transmission rates, since we know which i and j CompartmentTypes were affected by the last transmission event (we can reuse most of the rates when there are many Types).
  9. Otherwise stop.

This algorithm assumes that all individuals are immortal (no mortality) and a constant population (no migration from an external population). We can relax these assumptions: for example, we eventually want to allow the user to specify a CompartmentType-specific mortality rate distribution, so that the time to death can be drawn for a potential source individual.

Simulating the outer tree

The outer tree is a transmission tree that relates Compartments with sampled infections. From the previous step, we have already drawn transmission event times and directions from the probability distributions defined by the model. The first step is to randomly assign the sampled Compartments to the transmission events (sampling events of the given Type uniformly without replacement). When there are unsampled infected individuals for a given CompartmentType, there will be more transmission events than sampled Compartments of that Type.

Next, the assigned transmission events need to be resolved into a transmission tree. For a given event, the source CompartmentType has already been determined at the previous stage. We assume that every member of that source CompartmentType is equally likely to be the source individual, for both unsampled and sampled members of that Type. If the source is unsampled, then we note this as a singleton node in the transmission tree, potentially with a change in Type. That source Compartment is upgraded from being "unsampled" to "sampled" and it gets assigned a transmission event and tracked like any of the original sampled Compartments. If the source is sampled, then we have a bifurcating node in the transmission tree. This proceeds until there is only one sampled Compartment (the root).

In future work, we are going to allow pathogen lineages to vary in transmission rates through the LineageType object, which will introduce a fourth term into calculating the net transmission rate between source and recipient CompartmentTypes.

Simulating the inner tree

At this stage, we have fixed the outer tree. The fixed events for simulating the inner tree are the sampling times of pathogen lineages within the sampled hosts, and the transmission events. The sampling of a host is defined by the sampling of at least one lineage within that host. There may be more than one pathogen lineage sampled from a host, and they may be sampled at different points in time. In other words, the earlier sampling of a lineage is a fixed event that eventually occurs as we follow a sampled host back in time. For all sampled lineages, we track those that have been sampled at time t and those that have yet to be sampled as separate lists.

The random events for simulating the inner tree are:

  • The coalescence of two Lineages in the same host. Lineages in different hosts cannot coalesce. The Lineages that coalesce are no longer active (i.e., "not extant") --- they cannot coalesce with other Lineages, migrate to another host, or be involved in a transmission bottleneck. Instead, they are both replaced by one ancestral Lineage that assumes an "extant" status. This ancestral lineage is also considered to be "sampled", even though it has not been directly observed.
  • The migration of a Lineage to another host. Migrations occur at an intrinsic rate that is defined by the source and destination CompartmentTypes and is set by the user, and is proportional to the pathogen population size in the source Compartment. For now, we have to make the simplifying assumption that this population size is the same as that determines the coalescence rate (i.e., the effective population size).
  • The transmission bottleneck. We assume that the bottleneck is a random sample of the Lineages that have not yet coalesced by the time we reach the transmission event for that host. It is more complicated than this because the bottleneck is determining how many viruses are transmitted from the source to the recipient host going forward in time, whereas we are tracking the provenance of sampled Lineages.

Inputs

Input Class Value Required
Sample times Lineage Numeric Yes
Sample locations Lineage Compartment Yes
Number of unsampled lineages CompartmentType Numeric No?
Number of uninfected members CompartmentType Numeric No?

Summary

The simulation comprises fixed and random events. Initially, the fixed events are the sampling times and locations of pathogen lineages, which must be provided by the user. If the user does not supply a transmission tree, then the tree needs to be generated by sampling transmission events. Rates of transmission between compartments of different types is determined by the total numbers of susceptible and (sampled/unsampled) infected members of the respective types, and the per-contact rate.

  • Sampling lineages from a host/compartment
  • A transmission event between hosts in the case of an input user tree.

Random events may consist of:

  • The coalescence of lineages within a host/compartment
  • A transmission event between hosts/compartments
  • A migration event between hosts/compartments

Pseudocode

  1. Generate CompartmentType, Compartment and Lineage objects from user inputs (YAML and/or Newick tree).
  2. Initialize the list of fixed events.
  • Add lineage sampling events from Lineage objects.
  • If the user input includes a tree (host tree) then add transmission events, and go to step X.
  • If the user input does not include a host tree, then transmission events must be simulated first before coalescence (and migration).
  1. Simulate transmission events and fix them to the timeline of lineage sampled events. For each CompartmentType:
  • Enumerate active compartments, including Unsampled Hosts (U) at time t=0.
  • Enumerate active lineages of Infected (I), pairs of active lineages within hosts at time t=0.
  • Enumerate number of Susceptibles (S) at time t=0.
  1. Begin main loop
  • Compute total event rate from the sum of all I and U compartments with the current number of S compartments
  • Sample waiting time from exponential distribution of the negative total event rate, and update time t.
  • Sample event type
  • Update infection time or source of sampled compartments, and update the number of S, I, and U.
  • Stop if I = 1, else continue loop.
Clone this wiki locally