Infection Spread – Simulation & Visualization
Updated: Feb 11, 2020
In this blog, I will detail something a bit different. With the recent outbreaks of 2019 Novel Coronavirus, I became interested in modeling pathogenic spread. More accurately, I wanted to see if I could replicate the behavior of virulent pathogens given fixed initial input conditions (initial number of people sick, infection rate, death rate, incubation period, symptom duration).
From the outset, I'd clarify that this is meant as a proof of concept. I will follow this post up with one trying to forecast infections by number infected, number dead, and number cured. I'm interested to see how we can create robust machine learning models and how they depend on the underlying distributions and properties of the data generation process.
A Walk Through One Scenario
Imagine there is an infection from some initial source that infects five people, noted below in a network graph as numbered nodes in red. Nodes in red always represent unresolved infections, that is those where the individual has neither died nor been cured by their own immune system's natural process.
Each node is numbered in terms of the order in which someone acquired our simulation pathogen. Every day that passes (past an initial incubation period), there is a certain chance that an individual passes on that infection to another. On day 4 of the simulation I've built, we see the first spread of the infection. In this case, "1" has not infected any others, while "5" has infected three others. Clearly, we're running a simulation with a highly infectious pathogen.
On day 8 of our simulation, we see two deaths ("4" and "2") from the disease as well as one ("3") who was cured and no longer can die nor be infected nor pass on the disease. However, the disease is still incubating in some and producing symptoms in others (i.e. the red nodes represents either of those two situations).
At the end of a fortnight, this is what our simulation shows:
Walking Through the Code
As this is a simulation, we can model several simulations and compare them.
The simulation I'm showing here has these inputs (which could all be changed):
infectivity = 1.2 # on avg one person infects this many people
death_rate = .2 # on average this many infected die overall
incubation_period = range(3,7) # no chance of death, no chance of spread
range_symptoms = range(5,10) # actively passing on disease, chance of death
# after incubation_period + range_symptoms, if you don't die, you're cured
days = 14 # number of days to run the simulation
initial_sick = 5
We can break this down bit by bit. First, let's look at a crucial piece of the code. The class "Person". The full code is presented below, but feel free to skim it and use it as a reference after reading the entire post.
The class essentially constructs a person with several variables that will be unique to that individual and each will all also have a function that will use those variables to compute whether they die, infect others, etc.
Let's run down some of the attributes:
person.name = the order from 1 in which they acquired the infection. The function that runs the simulation assigns a name and simply increases the name by one each time a new person is infected.
self.infects takes in the infectivity initial condition (in this case 1.2 people on average) and creates a gamma distribution from it so that the infectivity will be the mean of the distribution. Below, we can see the gamma distribution for an average infectivity of 1.2 people on average. I make this an integer in the code so that self.infects will be a predetermined number of people this particular instance of a person infects.
To be more precise, though, the infectivity initial condition we entered will not truly be the mean of infection spread as our people will only be able to spread the infection while they are alive. If they die, they may or may not have finished spreading the disease to this predetermined attribute. The code works so that each day that the person is infected past the incubation period, there is a random choice of between (0,self.infects] while self.infects is itself reduced in accordance with every actual infection. So this may look like this for one particular individual:
self.infects = 2 (drawn from the gamma distribution)
day x --> infects 1 (self.infects now = 1)
day x+1 --> infects 0
day x+2 --> infects 0
day x+3 --> infects 1 (self.infects now = 0, i.e. can't infect any more)
Why a gamma distribution? If the average would be 1.2 infections per person, that means a lot of people would have to infect 0 and 1 people to counterbalance people that infect 2,3,4, and even 5,6 or more people -- which is entirely reasonable to assume some people do (they might go to a conference or interact with a lot of people in their work for example). The gamma distribution is a great way to represent that.
self.incubation_period randomly chooses between the range of incubation periods given. This could be modified by a normal distribution or another type of distribution, but for now, I just weighted each incubation period possibility equally, as I believe it would not drastically alter our simulation. During this period, essentially nothing happens. The person has zero chance of passing on the disease nor any chance of dying. I created this variable to add a bit of variance in the timing of the disease spread and more important, to allow super easy manipulation of code so that there can be passing of disease during any point of the incubation period.
self.range_symptoms is very similar to the incubation period, where there is a random choice between the possible values of the range of symptoms. This is the number of days where the person is actively ill. Every day in this range, there is a chance of spreading the disease (which is adjusted to fit this variable -- see code) and a chance of death (also adjusted to the number of days in this variable -- see code). If the person has not died by the last day in this range, they are cured.
The function essentially runs each time a day passes. It runs all of the logic I've written about and preceding variables and updates several of the class attributes, most notably whether the person is dead/infected/cured.
The above simulation I showed you in network graphs represents running the simulation once. What happens if you run it 10 times (or however many times you tell it)? I wrote a function to do just this.
If we run our simulation 10 times, we see variation in all of the variables of interest.
Each colored line represents a different simulation run, all with the same initial conditions. So after two weeks, one simulation ended with 8 dead and another with 30 dead. If we ran this for longer, those variations would increase and we would see big differences from randomness despite the same initial conditions. In fact, if you have a very small number of initial infected, they may all die or be cured (just by chance) before spreading it, thereby ending the outbreak. It is quite fun to play around with the various parameters of the simulation!
In the next blog post, I will describe the simulation a bit more. Additionally, I will discuss how I use machine learning to try to predict the results of the simulation (modeling will be blind to the data generation conditions), and what insights we can gather about modeling infectious disease patterns.
P.S. Below is the main code for the simulation, if you're curious about it.