Managing Infected as a Few Queues instead of Many Vectors

Queues vs Vectors?

Here we do an experiment to compare benefits of treating the infected cohort as a few queues instead of a lot of vectors. Potentially, under some circumstances, this could lead to a lot fewer touches and operations.

The Idea

Most of the time, even though we're deliberately modeling agents, most of our agents are not really interesting as agents. They're interesting as cohorts. We've already noted that for EULAs. And even for unborns, we don't yet care about their "agency". And in fact during the exposure (E=Incubators) state, and for runtime-EULAs (Recovereds) we also usually don't care about anything more than their total. And maybe even for Infecteds*. Now of course the thing we do care about for Es and Is is their timers because we need to update their timers and process the agents when the timers expire. In my mind, these are queues of agents. The thing about queues, from a datatype point of view, is that once you've enqueued an entity, you don't care about it until you dequeue it. It's really the Susceptibles that we care about as individuals.

Is there a way to "de-agency" the Us, Es, Is and Rs while preserving agency for the Ss that makes the code run significantly faster?

What if instead of touching all the infecteds each timestep, once we calculate their infectious duration (as an integer), we put all the individuals with the same duration into a bucket and ignore them until that timestep comes around? In other words, if we're at time=100, and N individuals become infected, and we calculate infectious durations for them randomly from 8 to 12 timesteps, we'll have cohorts for them recovering at t=108, 109, 110, 111, and 112. We can put all agent indices for the "will recover at t=108" in some kind of container -- a map to a list of integers comes to mind, which is a dict in Python -- and do literally nothing with them until t=108.

The question then becomes: does the overhead of creating/updating/checking this map of recovery-timesteps to lists of indices use significantly less processor/wall-clock time than the vectorized constant timer decrements and checks we're doing ow? One might hope so.

Note: there are definitely cases like age-based mixing where we do care about Infecteds as individuals, but let's ignore that for now.

Experiment

These diagrams don't help much but here they are:

I created a standalone python program which allocated a contiguous boolean array of 1e6 integers. (Easily switchable to 1e5 or 1e7 or anything else.) Think of this as who is infected.

Each timestep we pick some random fraction (say 1%) of the cohort to infect. Select only already uninfected. For each newly infected, pick a infectious duration as a random draw from a uniform distribution from 5 to 15. (Distribution and ranges don't matter a huge amount probably.)
Then use 3 different techniques to manage these until the timers are expired.
Once expired, set the infected state for the recovereds back to false.
Measure duration.

Note that we're not using any separate population of susceptibles on the input or recovereds on the output. For the purposes of this experiment, it should be sufficient to just thrash on a single array.

Note also that even though we discussed 2 different approaches -- default vector math and the proposed containers -- we have 3 experiments, not 2. This is because we wanted to test a couple of different ways of doing the new proposal:

A map of "day of recovery" to infected indices;
A priority queue.

The latter is potentially a distinction without a difference but it's potentially different at the Python level and one could be simpler to maintain than the other.

Preliminary Results

So far in Python, the heapq priority queue is far worse than the dict, so no further comment will be made on that approach.

The proposed map/dict approach has shown to be consistently but only marginally (~5-10%) faster than the existing vector math approach. This seems surprising.

I'm going to continue this experiment but switch to C/C++ for the implementation and report back. I find it surprising that the low-touch queues of virtual vectors would not be noticeably faster.

C++ Results

In C++, the map solution seems to be 40% faster than the vector math solution.

Python tests: https://github.com/jonathanhhb/laser/blob/better_math/sql_modeling/src/tests/infected_priq.py C++ tests: https://github.com/jonathanhhb/laser/blob/better_math/sql_modeling/src/tests/infected_map.cpp

The names are misleading.

Managing Infected as a Few Queues instead of Many Vectors

Queues vs Vectors?

The Idea

Experiment

Preliminary Results

C++ Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally