MLE for Poisson Distribution¶

Poisson Example: In a study, researchers are interested in anayzing co-speech gestures (arm and hand movements accompanying the speech) by Katalan and Korean speakers. The data below comes from study in which participants watched a cartoon and then told a partner (a friend or a professor) about what they had seen. The data is available as a .csv file in the following Github repository.

If you place the word flat at the beginning of data url you can quickly view data as a Flat Data as given here.

Question: Assuming that the number of gestures follows a Poisson distribution with a mean parameter \(\lambda\). Find the maximum likelihood estimate of \(\lambda\) based on this sample.

#turn off warnings
import warnings
warnings.filterwarnings("ignore")

#import required libraries
import pandas as pd
import numpy as np
from scipy.special import factorial
import matplotlib.pyplot as plt

First import the data, check its shape (size), and hava a look at a few first lines. Please note that to read a .csv file from a GitHub repository, first go to the url where the data is available, click on the data set, then a preview window will open, and click on the Raw data button, copy thins url and paste it into pd.read_csv.

gesture_data = pd.read_csv("https://raw.githubusercontent.com/bodowinter/poisson_tutorial/main/dyads.csv")

#size of the data frame
gesture_data.shape

(54, 6)

#let's see the first 10 rows
gesture_data.head(n = 10) 

	ID	context	dur	language	gender	gestures
0	Catalan_1	friend	137	Catalan	M	61
1	Catalan_1	prof	136	Catalan	M	78
2	Catalan_2	friend	90	Catalan	F	31
3	Catalan_2	prof	107	Catalan	F	40
4	Catalan_3	friend	181	Catalan	M	81
5	Catalan_3	prof	165	Catalan	M	49
6	Catalan_4	friend	130	Catalan	M	32
7	Catalan_4	prof	115	Catalan	M	49
8	Catalan_5	friend	122	Catalan	F	39
9	Catalan_5	prof	125	Catalan	F	32

Get the gestures column only and change its name.

#get only the gestures column
data = gesture_data.gestures

Check the mean.

np.mean(data)

49.44444444444444

#creater a grid vector for lambda
lambd = np.arange(1, 1000, 1)
#print(lambd)

Define a function which calculates the MLE of the data under a Poisson distribution. This function requires two parameters: data and a vector of possible values for \(\lambda\) parameter. Then, the function calculates the log-likelihood of the data for each \(\lambda\) value, and plot log-likelihood values versus \(\lambda\) values. The \(\hat \lambda\) where the curve the attains a maximum can considered as the maximum likelihood estimate for \(\lambda\).

def poisson_loglik_plot(data, lambd):
    
    #fixed values
    n = np.size(data)
    sum_x = np.sum(data)
    sum_in_x_factorial = np.sum(np.log(factorial(data)))
    
    #define log_likelihood
    log_lik = -lambd*n + sum_x * np.log(lambd)-sum_in_x_factorial
    
    # plot log-lik versus lambda values
    fig, ax = plt.subplots(figsize = (6, 6))

    plt.plot(lambd, log_lik)  
    plt.axvline(x=lambd[np.argmax(log_lik)], color='r') #find the lambda where log-lik is a maximum.
    ax.text(49, -48000, '49', color='red', ha='center', va='top') #not a smart way         
    plt.xlabel(r'$\lambda$')
    plt.ylabel('Log-likelihood value')
    plt.title('Log-likelihood of gestures data under Poisson Distribution')
    plt.legend(loc='upper right')
    plt.show()

poisson_loglik_plot(data = data, lambd = lambd)

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

The \(\hat \lambda\) = 49 is the MLE for \(\lambda\).

Some optimzation routines in Python may prefer minimizing an objective function. In that case, we need to define negative log-likelihood. In such a case, the \(\hat \lambda\) where the curve the attains a minimum can considered as the maximum likelihood estimate for \(\lambda\).

def poisson_negloglik_plot(data, lambd):
    
    #fixed values
    n = np.size(data)
    sum_x = np.sum(data)
    sum_in_x_factorial = np.sum(np.log(factorial(data)))
    
    #define log_likelihood
    log_lik = -lambd*n + sum_x * np.log(lambd)-sum_in_x_factorial
    
    neg_log_lik = -log_lik
    
    # plot neg-log-lik versus lambda values
    fig, ax = plt.subplots(figsize = (6, 6))

    plt.plot(lambd, neg_log_lik)  
    plt.axvline(x=lambd[np.argmin(neg_log_lik)], color='r') #find the lambda where neg_log-lik is a minimum.
    ax.text(49, -3000, '49', color='red', ha='center', va='top') #not a smart way         
    plt.xlabel(r'$\lambda$')
    plt.ylabel('Negative log-likelihood value')
    plt.title('Log-likelihood of gestures data under Poisson Distribution')
    plt.legend(loc='upper right')
    plt.show()

poisson_negloglik_plot(data = data, lambd = lambd)

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

We can also find the maximum (minimum) point where the log-likelihood (negative log-likelihood) attains through an optimization routine.

First define neg-log-lik as a function.

def poisson_negloglik(lambd):
    
    #data is not an input argument, assuming data object is already available in the session.
    #fixed values
    n = np.size(data)
    sum_x = np.sum(data)
    sum_in_x_factorial = np.sum(np.log(factorial(data)))
    
    #define log_likelihood
    log_lik = -lambd*n + sum_x * np.log(lambd)-sum_in_x_factorial
    
    neg_log_lik = -log_lik
    
    return neg_log_lik

Then use minimize() function fom Scipy.optimize with method='L-BFGS-B' to find the arg min.

from scipy.optimize import minimize
poisson_mle = minimize(poisson_negloglik, np.array([1]), method='L-BFGS-B')
poisson_mle

      fun: 443.5523349036321
 hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
      jac: array([0.])
  message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     nfev: 34
      nit: 14
     njev: 17
   status: 0
  success: True
        x: array([49.4445145])

Spring 2022

MLE for Poisson Distribution

MLE for Poisson Distribution¶