In this Blogpost we will derive the equations required to fit Poisson mixture from scratch and implement the model using Python.
Published
31 December 2021
Count data are ubiquitous in science. For example, in biology hightroughput sequencing experiments create huge datasets of gene counts. In this blogpost, I will take a closer look at the Expectation Maximisation (EM) algorithm and use it to derive a Poisson mixture model. To get started, however, we will simulate some data from a Poisson mixture using numpy and scipy.
To get a better feeling for these data it is instructive to quickly plot a histogram.
The plot nicely illustrates the three clusters in the data centered at roughly 30, 100 and 150. Now, imagine that you had been given these data and your task is to determine the proportion of data belonging to each of the three clusters and their respective cluster centers. In the following, we will first motivate the usage of the EM algorithm, apply it to our problem and finally implement the model in Python.
Defining a Poisson mixture
Let us now formalise the problem. We denote our (synthetic) data with X={x1,…,xN} and let μ={μ1,…,μK} be the vector representing the cluster means of cluster k={1,…,K} . If we are given the cluster $k$ for a data point $x_n$ we can compute its likelihood using
p(xn∣k;μ)=Pois(xn;μk).
A finite mixture of such Poisson distributions can be expressed as
p(xn;μ,π)=k=1∑KπkPois(xn;μk)
and the likelihood for the whole dataset $\mathbf{X}$ is given by
p(X;μ,π)=n=1∏Nk=1∑KπkPois(xn;μk).(1)
From $(1)$ we can see that it is difficult to optimise the log-likelihood
logp(X;μ,π)=n=1∑Nlogk=1∑KπkPois(xn;μk),(2)
as this expression involves the log of sum, making it hard to find close form solutions for the parameters of the model.
The EM algorithm
In the previous section we found that it is infeasible to optimise the marginal log likelihood (Eq. $(2)$). In such cases we can employ the EM algorithm to simplify the optimisation problem. In particular, we will introduce for each data point $x_n$ a corresponding latent variable $\mathbf{z}_n$, and derive the log joint distribution of $x_n$ and zn , which is easier to optimise than Eq. $(2)$.
Before we derive the equations to infer the parameters θ={μ,π} of our Poisson mixture, let us quickly recap the EM algorithm (Bishop, 2006).
Initialise the parameters of the model θold .
E-step: Compute the posterior distribution of the latent variable p(znk∣xn;θold) using the current parameter estimates.
M-step: Determine the parameter updates by minimising the expected joint log likelihood under the posterior determined in the E-Step
Check for convergence by means of the log likelihood or parameter values. If the convergence criterion is not satisfied update
θold←θnew
and return to step 2.
The joint log likelihood
The first step involves to determine the joint log likelihood of the model. In finite mixture models, zn=(zn1,…,znK)T is a binary $K$-dimensional vector in which a single component equals $1$. Hence, we can write the conditional distribution of $x_n$ given $\mathbf{z}_n$
p(xn∣zn;μ)=k=1∏KPois(xn;μk)znk
and the prior for zn as
p(zn;π)=k=1∏Kπkznk.
The joint probability distribution (for the complete data) is therefore
Comparing Eq. $(2)$ and $(3)$ reveals that in the the latter equation the logarithm distributes over the prior and the conditional distribution. This greatly simplifies the optimisation problem.
Setting up the E-step: The latent posterior distribution
To find the posterior distribution of $z_{nk}$ we make use of Bayes rule
Note that we have to use Lagrange Multipliers in order to obtain the update rule for $\pi_k$.
Implementing the model
After having done the hard math, the actual implementation of the model is straight forward. The PoissonMixture class takes the as arguments the number of desired clusters $K$ and allows to specifify the initial parameters in the __init__ method. The implementation of the E-Step and and M-Step directly follows from Eq. $(4)$ and $(6)$. Finally, the PoissonMixture model provides a function to compute negative log likelihood1 of the model as well as a fit method that takes the count data as input.
To test our model we instatiate it provide the date via the fit method.
We find that the EM algorithm quickly converges to the optimal solutions, although obviously this implementation is for demonstration purposes only.
Conclusion
In this article we have derived the EM algorithm to fit Poisson mixture models. Further extensions of the model could involve the incorporation of a prior for $\boldsymbol{\mu}$ or $\boldsymbol{\pi}$.
Technically the nll method computes the expected joint log likelihood (Eq. $(5)$). ↩
Bishop, C. M. (2006). Pattern Recognition and Machine Learning.