Bayesian inverse problem practice

STA360 at Duke University

Toy example: logistic growth

E. coli

Code

library(tidyverse)

E. coli grows in a petri dish according to the logistic growth equation,

\[ \frac{dP}{dt} = rP( 1 - \frac{P}{K}) \]

where \(P(t)\) is the population size at time \(t\), \(r\) is the per capita growth rate, and \(K\) is the carrying capacity of the dish.

This equation can be solved analytically (which is what makes this a toy example),

\[ P(t) = \frac{K}{1 + Ae^{-rt}} \]

where \(A = \frac{K - P_0}{P_0}\) and \(P_0\) is the initial population size at time 0.

Data

Bacterial populations are measured in CFU (colony forming units). At time \(t = 0\), 1 colony forming unit was placed into a dish.

populationFunction = function(K, r, t) {
  P0 = 1
  A = (K - P0) / P0
  round(K / (1 + (A * exp(-r * t))))
}

In the past, CFU are counted manually in a small pipetted sample and then the CFU per unit volume are multiplied by the total volume in the dish to estimate the total population size of bacteria at a given time \(t\).

A new machine has been created to replace this mundane task by automating the counting of CFUs in a sample. The machine counts the bacteria in the dish every fifteen minutes since the start of the experiment. Data are provided below.

y = read_csv("https://sta360-fa24.github.io/data/E_coli.csv")
glimpse(y)

Rows: 24
Columns: 2
$ P    <dbl> 2, 3, 5, 8, 13, 22, 37, 64, 103, 186, 318, 542, 843, 1490, 2283, …
$ time <dbl> 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 2…

Note: time is in minutes.

Model

We know \(P_0 = 1\). Let \(\theta = \{r, K\}\)

We assume the machine has some normal measurement error that is proportional to the size of the population. Thus the data generative model is given by

\[ P(t) |r,K \sim N(f(\theta, t),\sigma^2 f(\theta, t)) \]

where \(f(\theta, t)\) is computed by populationFunction on the previous slide.

Exercise

Your task is to determine the measurment error of the machine \(\sigma^2\).

It is well known that E coli takes about 20 minutes in laboratory conditions to double in size. Given this, we’d expect \(r\) to be about \(\frac{\log(2)}{20}\).

Furthermore, we know the carrying capacity of the petri dish is at least \(10^8\) CFU.

Identify each unknown.
Use the information above to develop reasonable priors over each unknown.
Write pseudo-code of an MCMC sampler to make inference about the parameters.
(optional) Implement your sampler.

Solution

The unknowns are \(r, K, \sigma^2\).

Maybe we think it takes somewhere between 15 minutes and 25 minutes for the population size to double under our lab conditions. Some plausible prior choices are

\[ \begin{aligned} r &\sim \text{Uniform}(\log(2)/25, \log(2)/15)\\ K &\propto 1 \text{ if } K > 10^8\\\ \sigma^2 &\sim \text{inverse-gamma}(1, 1) \end{aligned} \]

Pseudocode:

initialize r, K, sigma^2 
S = large number
for (i in 1:S) {
 - sample r* ~ cnorm(1, r_s, .01, a = log(2)/25, b = log(2)/15)
 - compute MH ratio: [posterior(r*)/posterior(r_s)] * [J(r_s | r*)/J(r* | r_s)]
 - accept/reject
 - sample K* ~ cnorm(1, K_s, 1E5, a = 1E8)
 - compute MH ratio like above and accept/reject
 - sample sigma^2* ~ cnorm(1, sigma^2_s, .1, a = 0)
 - compute MH ratio like above and accept/reject
}