Module 06: MLE

Concept Source

Scenario

Your store sells a product at an average rate of 3 items per hour. Estimate the average rate over the next 100 hours by finding the maximum likelihood estimate (MLE).

Goals

  • Simulate an appropriate data set
  • Find and plot the Maximum Likelihood Estimate (MLE) for the rate using the data set
  • Test whether the calculated MLE is significantly different from the known average

Data (Poisson)

Simulation

First, set the seed of the random number generator so your results will be consistent each time you run the code. Use rpois() to create the data from the Poisson distribution.

#set the seed of R's random number generator
set.seed(123)

#sample size
n <- 100

#lambda
p <- 3

#simulate and store data
#Ahrens, J. H. and Dieter, U. (1982). 
#Computer generation of Poisson deviates from modified normal distributions. 
#ACM Transactions on Mathematical Software, 8, 163–179.
data <- rpois(n, p)

Frequency

Use table() to create a simple data table so you can explore the output from rpois(). We can see that, in quite a few of the hours in our sample, 2 to 4 items were sold. This aligns with our prediction so far.

#count how often each rate occurred
t<-table(data)
t
## data
##  0  1  2  3  4  5  6  7  8 
##  5 14 24 20 22  9  4  1  1

Exploration

You may wish to look at your data using histograms and barplots. For this, we can use hist() and barplot() which are included in the base graphics capabilities. To use hist(), pass the data to the function. To create a simple bar plot, simply pass the table to barplot() and R will create a plot for you. What change might make the bar plot look more like the classic Poisson distribution? What design changes might you make if you were to improve the bar plot?

Histogram: base R graphics

Notice the default axis labels and scales.

hist(data)

hist(data, freq=FALSE)

Bar Plot: base R graphics

R provides some powerful functions for easy visualization but fewer options can be more limiting. Notice the default axis labels and scales.

#create a bar plot
barplot(t)

Bar Plot: ggplot2

The ggplot2 library is more complex, but allows for more customized plots. Notice the default axis labels and scales.

#first create a data frame using the table
df<-data.frame(t)

#rename the columns
colnames(df)<-c("lambda","frequency")

#call ggplot; use the data frame and tell aes()what to use for X and Y values
#finally, add on a geom_bar() and tell 'stat' transformation to use
#identity because you want bar height to represent the data
ggplot(df, aes(x=lambda,y=frequency)) +
  geom_bar(stat="identity")

Here is a more stylish plot. Notice the use of accent color and a subtle border to increase contrast between the bars and the background. When plotting, think about what story you want to tell your audience? What design choices will most effectively help you tell that story?

#define a color vector to use in the plot
#ggplot needs a list of colors; one for each level of lambda

#we can use rep() to create a vector of the same value
#use the number of different lambdas 0,1,2,...8
#barFill<-rep("#008f8f",length(levels(df$lambda)))

#OR, for multiple colors, just define the vector explicitly.
#we can specify the first three values manually, then use rep() to fill in the remaining values
lightGrey<-"#F0F0F0"
accentColor<-"#FF3300"
barBorder<-"#CCCCCC"
barFill<-c(lightGrey,lightGrey,accentColor,rep(lightGrey,length(levels(df$lambda))-3))
barFillGrey<-c(rep(lightGrey,length(levels(df$lambda))))

ggplot(df, aes(x=lambda,y=frequency,fill=lambda)) +
  geom_bar(stat="identity", color=barBorder)+
  scale_y_continuous(limits=c(0,25), breaks=seq(0,25,by=5))+
  scale_fill_manual(values=barFill)+
  labs(x="Items/hour (lambda)", y="Frequency",title="Frequency is highest when Lambda equals 2", subtitle="We can use a strong, high-contrast accent color to draw the viewers' attention to this detail.", caption="Wow this plot is amazing")+
  theme_minimal()+ 
  theme(legend.position = "none")

Likelihood

The goal of likelihood calculations is to find the parameter value that is most likely to result in the observed data. In this case, because our data are Poisson data, we are looking for the estimated value of lambda (\(\lambda\)) that yielded the observed data set. Now, we know the data were simulated so we can expect the estimate of \(\lambda\) to be close to the true value that we used to create the data set.

Reference 1

Reference 2

Calculations (few lambdas)

For calculations, we will create a data frame of our output data, add a column containing likelihood values, remove duplicates, and then find the value of lambda with the maximum likelihood.

Poisson functions

#create a new column 'likelihood' in the data frame using mutate() from library(dplyr)
#the lambdas are factors so convert them to numeric so dpois() will run properly
#store the results in a new data frame that includes the likelihood values: 'dfL'
#this preserves your original dataframe if you need it for some other reason
#dfL <- df %>% mutate(likelihood = dpois(x = data, lambda = as.numeric(lambda)))

#create a data frame of the n output values 
#we will eventually put likelihood values in here
#use an L to abbreviate 'likelihood'
dfL<- data.frame(data)

#rename column name
#the c() is overkill but we will use it anyway for consistency
colnames(dfL)<-c("lambda")

Use dpois() to find the likelihood for each value of lambda (items/hour) in the data.

#add a column containing likelihood values
dfL <- dfL %>% mutate(likelihood = dpois(x = data, lambda = p))

#check the result
head(dfL)
#extract the lambda values to use later
x <- as.numeric(names(table(dfL$lambda)))

Remove duplicates and filter to reveal the lambda (items/hour) with the highest likelihood.

#there are some duplicated values 
#drop the rows with duplicate values in the likelihood column
#store the result back into the same dataframe name
dfL_noDuplicates<-dfL[!duplicated(dfL$likelihood), ]

#get the Maximum Likelihood Estimate (MLE)
#it is the maximum value from the likelihood column
mle<-dfL_noDuplicates %>% filter(likelihood == max(likelihood))

#print the MLE value
cat("The MLE is: ", mle$lambda)
## The MLE is:  3

Plot (few lambdas)

Create a plot of lambda as the \(x\) value and likelihood as the \(y\) value. We can use a simple ggplot version for this. You’ll need to provide the data frame, specify the x and y, and then tell ggplot to use points.

ggplot(dfL, aes(x=lambda, y=likelihood)) + geom_point()

Calculations (many lambdas)

#create a vector of lambda values
lambdaValues <- seq(0, max(x), by = 0.01)

We can create a function for likelihood and apply it across the lambda values.

#create a likelihood function (NOT LOG-LIKELIHOOD)
likelihood <- function(data, par_p)
{
  #use dpois() 
  LL <- sum(dpois(x = data, lambda = par_p))
  return(LL)
}

#apply the function over the lambda values
likelihoodValues<- sapply(lambdaValues, FUN = likelihood, data = data)

The MLE is in the list so find it using which.max().

#find the the lambda with the highest likelihood
maxLL <- which.max(likelihoodValues)

#get the MLE that matches the lambda
p_MLE <- lambdaValues[maxLL]

#print the MLE
cat("The Maximum Likelihood Estimate (MLE) is ", p_MLE)
## The Maximum Likelihood Estimate (MLE) is  2.51

Plot (many lambdas)

Plot the results to see how the increase in lambdas may have changed the MLE.

#plot lambda vs likelihood
#use a character that shows density (not a line)
plot(lambdaValues, likelihoodValues, type = "p", xlab = "lambda", ylab = "Likelihood", pch=3,xaxt='n')
axis(1, at = seq(0, 8, by = 0.5), las=1)

#add a point to the plot for the MLE
points(p_MLE, max(likelihoodValues), pch = 19, col = "red")

#add a vertical line to the plot for the MLE
abline(v = p_MLE, col = "red")

Log-Likelihood

The log transformation of some data sets is sometimes preferred. Let’s transform our data and find the MLE again using the log-likelihood instead of the likelihood.

Calculations (few lambdas)

First create a function to determine the log-likelihood.

# Remember: log(a*b) = log(a) + log(b)

# The sums of the single-sample log-likelihoods are used instead of multiplying the untransformed likelihood values. 
log_likelihood <- function(data, par_p)
{
  #use dpois() with log=TRUE
  LL <- sum(dpois(x = data, lambda = par_p, log = TRUE))
  return(LL)
}

Next, apply the function to the data.

#apply the function
log_likelihood(data = data,par_p = p)
## [1] -187.9301

Several values of lambda were created when the data were generated. Let’s extract these values so we can use them in plots.

#you could hard code the lambda values
#but that could be trouble if you change any initial parameters later
#x<- 0:8 #hard coded way

#let's pull lambda values from the data instead
x <- as.numeric(names(table(dfL$lambda)))

Let’s now create a vector of many lambda values so we can create a nice plot of lambda (\(x\)) versus log-likelihood (\(y\)).

#create a vector of the known lambdas
lambdaValues <- x

Use the log-likelihood function using sapply() which is kind of like a loop that returns a vector.

#apply the log_likelihood function created earlier 
LogLikelihoodValues <- sapply(lambdaValues, FUN = log_likelihood, data = data)

Find the MLE by finding the value of lambda with the highest log-likelihood.

#find and put line on estimate
#get the maximum value of the resulting vector
maxLL <- which.max(LogLikelihoodValues)

#get the value of the likelihood at that lambda value
p_MLE <- lambdaValues[maxLL]

#print the MLE
cat("The Maximum (Log) Likelihood Estimate (MLE) is ", p_MLE)
## The Maximum (Log) Likelihood Estimate (MLE) is  3

Plot (few lambdas)

Create a plot using base R graphics to see how the likelihood changes as the value for lambda increases.

#draw the plot
plot(lambdaValues, LogLikelihoodValues, type = "l", xlab = "lambda", ylab = "log-Likelihood")
axis(1, at = seq(0, 8, by = 1), las=1)

#add a point to the plot
points(p_MLE, max(LogLikelihoodValues), pch = 19, col = "red")

#add a vertical line to the plot
abline(v = p_MLE, col = "red")

Calculations (many lambdas)

Let’s now create a vector of many lambda values so we can create a nice plot of lambda (\(x\)) versus log-likelihood (\(y\)). Perhaps we can see a better estimate.

lambdaValues <- seq(0, max(x), by = 0.01)

Use the log-likelihood function using sapply() which is kind of like a loop that returns a vector.

#apply the log_likelihood function created earlier 
LogLikelihoodValues <- sapply(lambdaValues, FUN = log_likelihood, data = data)

Find the MLE by finding the value of lambda with the highest log-likelihood.

#find and put line on estimate
#get the maximum value of the resulting vector
maxLL <- which.max(LogLikelihoodValues)

#get the value of the likelihood at that lambda value
p_MLE <- lambdaValues[maxLL]

#print the MLE
cat("The Maximum (Log) Likelihood Estimate (MLE) is ", p_MLE)
## The Maximum (Log) Likelihood Estimate (MLE) is  2.94

Plot (many lambdas)

Create a plot using base R graphics to see how the log-likelihood changes as the value for lambda increases.

#draw the plot
plot(lambdaValues, LogLikelihoodValues, type = "l", xlab = "lambda", ylab = "log-Likelihood")
axis(1, at = seq(0, 8, by = 1), las=1)

#add a point to the plot
points(p_MLE, max(LogLikelihoodValues), pch = 19, col = "red")

#add a vertical line to the plot
abline(v = p_MLE, col = "red")

Interpretation

When examining the log-likelihood, your goal is to identify the highest value as an estimate of the parameter of interest. From our analysis, we see that the highest log-likelihood occurs when \(\lambda = 2.94\) when many values for lambda are evaluated. We can interpret this to mean that a \(\lambda\) value of 2.94 is the most likely value to have produced the observed data. Now, as we have the luxury of knowing exactly how the data were produced, we know the “true” mean is equal to 3. In a real-world context, we would not know this and would therefore rely on the MLE to suggest a most likely value for \(\lambda\).

Likelihood vs Log-Likelihood

The MLE was calculated using both likelihood and log-likelihood methods for both few and many lambda values. In both cases with few lambda values, the MLE was determined to be equal to 3. When many values of lambda were used, the MLE was determined to be 2.51 using the likelihood method and 2.94 using the log-likelihood method. Of these two, the log-likelihood method MLE was closer to the true mean of 3 used to simulate the data set.

Reference 1

Reference 2

Model Comparison

Draw the original bar plot and overlay the theoretical underlying distribution and the distribution based on the (log-likelihood) MLE of 2.94. This will allow you to see the true values versus the estimated values.

#create a table of proportional values
pv <- table(data)/n

xbars <- barplot(pv, xlab = "Lambda", ylab = "Proportion", ylim = c(0, 0.25))
points(xbars, dpois(x,p), type = "b", col = "blue", pch = 19, lty = 2)
points(xbars, dpois(x,p_MLE), type = "b", col = "red", pch = 19, lty = 2)
legend("topright", legend = c("Data","True p","MLE p"), col = c("gray", "blue","red"), pch = c(15, 19, 19))

Here is a version using ggplot2. Notice the upgrades to the axis text and labels. Also see that you can edit the color using an additional alpha channel: #rrggbbAA.

pvDF<-data.frame(pv)
colnames(pvDF)<-c("lambda","frequency")

pvDF$true<-dpois(x,p)
pvDF$pred<-dpois(x,p_MLE)


#Note about group=1: 
#https://stackoverflow.com/questions/27082601/ggplot2-line-chart-gives-geom-path-each-group-consist-of-only-one-observation
ggplot(pvDF,aes(x=lambda,y=frequency, fill=lambda, group = 1))+
  geom_bar(stat="identity", show.legend = FALSE, color=barBorder)+
  scale_fill_manual(values=barFillGrey)+
  theme_minimal()+
  labs(x="Lambda", y="Proportion",caption="Simulated vs Predicted Models")+
  geom_line(aes(y=true, color="True"),size=2)+
  geom_line(aes(y=pred, col="Predicted"),size=0.5)+
  scale_color_manual(values=c("True"="#3300FF55", "Predicted"="#FF3300"))+
  theme(legend.title = element_blank())+
  theme(axis.title=element_text(size=18, colour = "#333333", face = "bold"))+
  theme(text = element_text(size=20), axis.text.x = element_text(angle=0, hjust=1))

Confidence Intervals

Hypothesis Testing

One sample Test

Let’s test our data set to see if the calculated rate is different from what we believe it to be. That is, we think the average rate is 3, but we want to use our data to determine if this is true.

Our null hypothesis states that the rate is equal to 3.

\(H_0: Rate = 3\)

Our alternative hypothesis states that the rate is not equal to 3.

\(H_A: Rate \neq 3\)

We then use the poisson.test() function to evaluate our data and the rate of 3. We will imagine alpha = 0.05. If the resulting p-value is less than our alpha threshold, we will reject the null hypothesis. If the p-value is not less than the alpha value, we will not reject the null hypothesis.

#reference
#https://www.youtube.com/watch?v=vTWti3js8HE

#hypothesis test
#poisson.test(sum(data), length(data), r = lambda)
poisson.test(sum(data), length(data), r = 3)
## 
##  Exact Poisson test
## 
## data:  sum(data) time base: length(data)
## number of events = 294, time base = 100, p-value = 0.7509
## alternative hypothesis: true event rate is not equal to 3
## 95 percent confidence interval:
##  2.613506 3.296004
## sample estimates:
## event rate 
##       2.94

The resulting p-value is 0.7509 which leads us to fail to reject the null hypothesis in favor of the alternative hypothesis. Essentially, the estimated rate, lambda = 2.94, is no different than lambda = 3. There is no evidence the rate is not 3 sales per hour based on our data.

Confidence Interval Notes

A 95% confidence interval (CI) of the mean is a range with an upper and lower number calculated from a sample. Because the true population mean is unknown, this range describes possible values that the mean could be. If multiple samples were drawn from the same population and a 95% CI calculated for each sample, we would expect the population mean to be found within 95% of these CIs. CIs are sensitive to variability in the population (spread of values) and sample size. When used to compare the means of two or more treatment groups, a CI shows the magnitude of a difference between groups. This is helpful in understanding both the statistical significance and the clinical significance of a treatment.

O’Brien SF, Yi QL. How do I interpret a confidence interval? Transfusion. 2016 Jul;56(7):1680-3. doi: 10.1111/trf.13635. Epub 2016 May 17. Review. PubMed PMID: 27184382.

If you want to do calculations manually, remember that \(CI_\lambda\) is calculated as \(\lambda ±1.96*\sqrt{\lambda/n}\). (Reference)

  • Upper Bound: \(2.94+1.96*\sqrt{2.94/100} = 3.27607\)
  • Lower Bound: \(2.94-1.96*\sqrt{2.94/100} = 2.60393\)
  • Summary: \(CI_{95} \approx [2.6,3.3]\)

Using R, we can use poisson.test() to extract the confidence interval with our data.

#calculate and print the rounded 95% confidence interval
ci95 <- poisson.test(sum(data), length(data), r = 3)$conf.int
cat("The 95% confidence interval for lambda = 3 is: ", round(ci95[1],digits=1), "to", round(ci95[2],digits=1))
## The 95% confidence interval for lambda = 3 is:  2.6 to 3.3

Last Questions

Change n at the top of the code and see what happens to the predicted MLE.

  • What happens to the difference between our predicted model and our ‘true’ simulated model if we increase our sample size to \(n = 10,000\)?

  • What happens to the p-value in our significance test?

  • What happens to the confidence interval?