From Adam Wagner comes a curious case of colonies:
You are studying a new strain of bacteria, Riddlerium classicum (or R. classicum, as the researchers call it). Each R. classicum bacterium will do one of two things: split into two copies of itself or die. There is an 80 percent chance of the former and a 20 percent chance of the latter.
If you start with a single R. classicum bacterium, what is the probability that it will lead to an everlasting colony (i.e., the colony will theoretically persist for an infinite amount of time)?
Extra credit: Suppose that, instead of 80 percent, each bacterium divides with probability p. Now what’s the probability that a single bacterium will lead to an everlasting colony?
Question 1
Defining the problem
Let’s take the case of one bacteria. Looking at the states that a bacteria can achieve it is clear that we can define a Bernoulli random variable as \[Let\ X\ =\ \{The\ event\ that\ a\ bacteria\ will\ split\ into\ two\} \sim Bernoulli(0.8)\] So first we sample from a Bernoulli variable. And if it comes true then we now have two bacteria.
Now let’s consider the case of two bacteria where we have two trials. Guess a Bernoulli variable won’t cut it now (pun intended). Let’s define a new random variable as \[Let\ Y\ =\ \{The\ number\ of\ bacteria\ that\ will\ split\ into\ two\ from\ n\ bacteria\}\\ Y\ \sim Binomial(n,0.8)\\ where\ n\ is\ the\ number\ of\ bacateria\ at\ a\ point\ in\ time\]
Dummy Simulation
So I thought of doing a simply simulation(Is this a Monte Carlo Simulation? I have no idea.) First I thought of simulating the bacteria growth for time_steps and then I repeat that again and again for evolutions times and from there I find the probability.
To find the probability I had to make an assumption, which was that the bacteria growths with upward trend will keep on going upward, which is satisfied as the probability of all bacteria dieing is quite low (i.e. for the case of let’s say 1000, then \(X\ \sim Bin(1000,0.8)\) and \(Pr(X = 0) = ^{1000}C_{0}(0.8)^{0}\times(0.2)^{1000} \approx 0\))
evolutions <-60time_steps <-60n_bactis <-1file_name_sim <-paste(here::here(),"/blog/data/","riddler_2020_06_12_sim.csv",sep="")if(exists(file_name_sim)){ sim <-read_csv(file_name_sim)}else{ sim <-map_dfr(1:evolutions,function(x){ n_bactis <-1lapply(1:time_steps, function(y){if(n_bactis >= (5*10^8)){ X <- n_bactis n_bactis <-1*10^9 }else{ X <-rbinom(1,n_bactis,0.8) n_bactis <<- X *2 }list(evol = x,step = y,not_dead = X,tot = n_bactis) }) })write_csv(sim,file_name_sim)}good_evols <- sim %>%filter(step == time_steps & tot >0) %>%nrow()prob <- good_evols / evolutionssim %>%ggplot(aes(x=step,y=tot,group=evol))+geom_point(size=0.2) +geom_line(alpha=0.4) +labs(x ="Time Step", y ="Total Bacteria (capped)",title="Bacteria Simulation",subtitle ="what is the probability that a single bacteria will lead to an everlasting colony?",caption ="The total is capped at 10^9 to ensure that the simulation doesnt go beyond the integer limit") +annotate("label",x=(time_steps/2),y=10^9,label =paste("Probability of spreading is =",prob))
The answer to the question is that the probability is approximately 0.75 through simulations
Now let’s try to mathematically derive it.
Question 2
So the second question seems like its just another layer of the earlier simulation to make it adaptable for more probabilities. But in reality what they require is a mathematical proof of the answer.
The relationship between \(p\) and the probability of a single bacteria creating a colony can be captured as follows.
good_evols <- sim_2 %>%filter(step == time_steps & tot >0) %>%count(prob)# prob <- good_evols / evolutionsprob_relation <- good_evols %>%mutate(perc = n / evolutions,prob =round(as.numeric(levels(good_evols$prob))[good_evols$prob],2))prob_relation %>%ggplot(aes(x = prob,y = perc)) +geom_point() +stat_smooth(method="lm") +labs(title ="Relationship between p and probability of a bacteria making a colony",x ="p", y ="Pr(bacteria making a colony)")
`geom_smooth()` using formula = 'y ~ x'
prob_lm <-lm(prob_relation,formula = perc ~ prob)summary(prob_lm)
Call:
lm(formula = perc ~ prob, data = prob_relation)
Residuals:
1 2 3 4 5 6
-0.094048 0.002738 0.112024 0.058810 0.005595 -0.085119
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8220 0.1648 -4.988 0.007556 **
prob 1.9071 0.2143 8.901 0.000881 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08963 on 4 degrees of freedom
Multiple R-squared: 0.9519, Adjusted R-squared: 0.9399
F-statistic: 79.22 on 1 and 4 DF, p-value: 0.0008805
So the answer I got from this hack is
Probability of one Bacteria becoming an entire colony =1.9071429*p-0.8220238
This is what I could mathematically derive as much as my tiny brain can allow,
To begin with when \(n = 1\) all bacteria must replicate,
when \(n = 2\) still all bacteria must replicate, otherwise if only one replicates then we are still at 2 bacteria.
when \(n = 4\), 3 or more should replicate,
when \(n = 6\), 4 or more should replicate,
when \(n = 10\), \((\frac{10}{2}) + 1\) or more should replicate, when \(n = N\), \((\frac{N}{2}) + 1\) or more should replicate
Therefore the probability that with \(p\) probability of replicating, a single bacteria can make an entire colony is given by