Annotated code for the procedure I will describe in this post can be found on my github.

As a side note, I just want to say how much I love this procedure. It uses parametric bootstrapping to correct Bayesian intervals to achieve frequentist coverage. God bless America.

## Empirical Bayesian Intervals

In my previous post on beta-binomial empirical Bayes analysis, I used the model

$y_i \sim Bin(n_i, \theta_i)$

$\theta_i \sim Beta(\alpha, \beta)$

The empirical Bayes method says to get estimates $\hat{\alpha}$ and $\hat{\beta}$ of the prior parameters - I give a method of moments estimator in the post, but marginal maximum likelihood may also be used - and then calculate posterior distributions for each of the $\theta_i$ using $Beta(\hat{\alpha}, \hat{\beta})$ as the prior, essentially using the data itself to estimate the prior.

The empirical Bayesian estimate is the mean of this posterior distribution. If an interval estimate is desired, a credible interval can be calculated by taking quantiles directly from this posterior - see my post on credible intervals. This is what I will call a "naive" empirical Bayesian interval.

## The Empirical Bayes Problem

The fundamental problem with naive empirical Bayesian intervals is that they often end up too short, inappropriately centered, or both. This is because the uncertainty of the prior parameters themselves has not been accounted for. From the law of total variance, the posterior variance is given by

$Var(\theta_i | y_i) = E_{\alpha, \beta|y_i}[Var(\theta_i|y_i, \alpha, \beta)] + Var_{\alpha, \beta|y_i}[E(\theta_i|y_i, \alpha, \beta)]$

Taking quantiles from the empirical Bayesian posterior estimates the first term, but not the second. For small samples this second term can be significant, and empirical Bayesian generally won't achieve nominal coverage (for more information, see Carlin and Louis's book

*Bayesian Methods for Data Analysis*)

One way to correct for the uncertainty is to perform a hierarchical Bayesian analysis, but it's not clear what the "correct" hyperpriors should be - and just using noninformative priors doesn't guarantee that you'll get nominal frequentist coverage.

An alternative is to use the bootstrap. Since I'm working in the parametric empirical Bayes case, a parametric bootstrap will be used, though this doesn't necessarily have to be the case. For more information on the technique I will use (and a discussion on how it applies to the normal-normal case), see Laird, N., and Louis, T. (1987), "

*Empirical Bayes Con fidence Intervals Based on Bootstrap Samples*," Journal of the American Statistical Association, 82(399), 739-750.

I want to emphasize that this is a technique for small samples. For even moderate samples, the uncertainty in the parameters will be small enough that naive empirical Bayesian intervals will have good frequentist properties, and this can be checked by simulation if you desire.

## Parametric Bootstrap

The idea of the parametric bootstrap is that we can account for $Var_{\alpha, \beta|y_i}[E(\theta_i|y_i, \alpha, \beta)]$ by resampling. In a traditional bootstrap, data is sampled with replacement from the original data set. Since this is a parametric bootstrap instead, we will resample by generating observations from the model assuming that our estimates $\hat{\alpha}$ and $\hat{\beta}$ are correct.

- Generate $\theta^*_1, \theta^*_2, ..., \theta^*_k$ from $Beta(\hat{\alpha}, \hat{\beta})$
- Generate $y^*_1, y^*_2,..., y^*_k$ from $Bin(n_i, \theta^*_i)$
- Estimate $\alpha$ and $\beta$ from the bootstrapped $(y^*_i, n_i)$ observations using the same method as you initially used. Call these estimates $\alpha^*_j, \beta^*_j$

In that way, we get a set of $N$ bootstrap estimates $\alpha^*, \beta^*$ of the parameters of the underlying beta distribution. The posterior density that accounts for uncertainty of $\hat{\alpha}$ and $\hat{\beta}$ can then be estimated as

$p^*(\theta_i | y_i, \hat{\alpha}, \hat{\beta}) \approx \dfrac{ \sum_{j = 1}^N p(\theta_i | y_i, \alpha^*_j, \beta^*_j)}{N}$

Essentially, just the raw average density at each point, averaging over all the bootstrapped parameters values. The corrected 95% empirical Bayesian interval is given by solving

$\displaystyle \int_{-\infty}^{l} p^*(\theta_i | y_i, \hat{\alpha}, \hat{\beta}) = 0.025$

$\displaystyle \int_{u}^{\infty} p^*(\theta_i | y_i, \hat{\alpha}, \hat{\beta}) = 0.975$

For lower and upper bounds $l$ and $u$ using numerical techniques.

## Baseball Example

In my previous post, I analyzed the famous Morris baseball data set that with respect to loss functions to show why empirical Bayes works. Analyzing it with respect to interval estimation also provides an interesting example.

Using a beta-binomial model with the method of moments estimator I described in the previous post, this data set has parameters $\hat{\alpha} = 97.676$ and $\hat{\beta} = 270.312$. Each player had $n_i = 45$ at-bats, so the posterior distribution for the batting average $\theta_i$ of player $i$ is

$\theta_i | y_i, \hat{\alpha}, \hat{\beta} \sim Beta(y_i + 97.676, 45 - y_i + 270.312)$

Naive intervals can be taken directly as central 95% quantiles from the posterior distribution - again, see my article on Bayesian credible intervals for more explanation on this.

\begin{array}{l c c c c c c c c} \hline

\textrm{Player} & y_i & y_i/n_i & \textrm{EB Estimate} & \textrm{Naive Lower} & \theta_i & \textrm{Naive Upper}\\ \hline

Clemente & 18 & .400 & .280 & 0.238 & .346 & 0.324 \\

F. Robinson & 17 & .378 & .278 & 0.236 & .298 & 0.322 \\

F. Howard & 16 & .356 & .275 & 0.233 & .276 & 0.319 \\

Johnstone & 15 & .333 & .273 & 0.231 & .222 & 0.317 \\

Barry & 14 & .311 & .270 & 0.229 & .273 & 0.314 \\

Spencer & 14 & .311 & .270 & 0.229 & .270 & 0.314 \\

Kessinger & 13 & .289 & .268 & 0.226 & .263 & 0.312 \\

L. Alvarado & 12 & .267 & .266 & 0.224 & .210 & 0.309 \\

Santo & 11 & .244 & .263 & 0.222 & .269 & 0.307\\

Swoboda & 11 & .244 & .263 & 0.222 & .230 & 0.307 \\

Unser & 10 &.222 & .261 & 0.220 & .264 & 0.304 \\

Williams & 10 & .222 & .261 & 0.220 & .256 & 0.304 \\

Scott & 10 & .222 & .261 & 0.220 & .303 & 0.304 \\

Petrocelli & 10 & .222 & .261 & 0.220 & .264 & 0.304 \\

E. Rodriguez & 10 & .222 & .261 & 0.220 & .226 & 0.304\\

Campaneris & 9 & .200 & .258 & 0.217 & .285 & 0.302 \\

Munson & 8 & .178 & .256 & 0.215 & .316 & 0.299 \\

Alvis & 7 & .156 & .253 & 0.213 & .200 & 0.296 \\ \hline

\end{array}

Thirteen out of the eighteen intervals captured the hitter's true average for the rest of the year, for an observed coverage of 72.22%. Roberto Clemente and Thurman Munson managed to overperform with respect to their intervals, while Jay Johnstone, Luis Alvarado, and Max Alvis underperformed.

The parametric bootstrap procedure fixes this as follows:

- Simulate a set of 18 new $\theta^*_i$ from a $Beta(97.676, 270.312)$ distribution.
- Simulate a set of 18 new $y^*_i$ from a $Bin(45, \theta^*_i)$ distribution
- Estimate $\alpha^*$ and $\beta^*$ using the same method used on the original data set.

$p^*(\theta_i | y_i, 97.676, 270.312) \approx \dfrac{ \sum_{j = 1}^{5000} p(\theta_i | y_i, \alpha^*_j, \beta^*_j)}{5000}$

For lower and upper bounds $l$ and $u$ - and in full disclosure, I didn't actually perform the full numerical integration. Instead, I averaged over the pbeta(x, alpha, beta) function in R and solved for the value where the averaged CDF is equal to 0.025 or 0.975.

The effect of this is to create a posterior distribution that's centered around the same empirical Bayesian estimate $\hat{\theta_i}$, but more spread out. This is shown in the naive (solid line) and bootstrapped (dashed line) posterior distributions for $y_i = 10$.

Quantiles taken from this bootstrapped distribution will give wider intervals than the naive empirical Bayesian intervals (though it is possible to come up with "odd" data sets where the bootstrap interval is shorter). The bootstrap interval is given by solving

$\displaystyle \int_{-\infty}^{l} p^*(\theta_i | y_i, 97.676, 270.312) = 0.025$

$\displaystyle \int_{u}^{\infty} p^*(\theta_i | y_i, 97.676, 270.312) = 0.975$

For lower and upper bounds $l$ and $u$ - and in full disclosure, I didn't actually perform the full numerical integration. Instead, I averaged over the pbeta(x, alpha, beta) function in R and solved for the value where the averaged CDF is equal to 0.025 or 0.975.

Doing this, 95% bootstrapped intervals are given by

\begin{array}{l c c c c c c c c} \hline

\textrm{Player} & y_i & y_i/n_i & \textrm{EB Estimate} & \textrm{Bootstrap Lower} & \theta_i & \textrm{Bootstrap Upper}\\ \hline

Clemente & 18 & .400 & .280 & 0.231 & .346 & 0.391 \\

F. Robinson & 17 & .378 & .278 & 0.227 & .298 & 0.382 \\

F. Howard & 16 & .356 & .275 & 0.222 & .276 & 0.372 \\

Johnstone & 15 & .333 & .273 & 0.217 & .222 & 0.363 \\

Barry & 14 & .311 & .270 & 0.211 & .273 & 0.355 \\

Spencer & 14 & .311 & .270 & 0.211 & .270 & 0.355 \\

Kessinger & 13 & .289 & .268 & 0.205 & .263 & 0.346 \\

L. Alvarado & 12 & .267 & .266 & 0.199 & .210 & 0.338 \\

Santo & 11 & .244 & .263 & 0.192 & .269 & 0.330\\

Swoboda & 11 & .244 & .263 & 0.192 & .230 & 0.330 \\

Unser & 10 &.222 & .261 & 0.184 & .264 & 0.323 \\

Williams & 10 & .222 & .261 & 0.184 & .256 & 0.323 \\

Scott & 10 & .222 & .261 & 0.184 & .303 & 0.323 \\

Petrocelli & 10 & .222 & .261 & 0.184 & .264 & 0.323 \\

E. Rodriguez & 10 & .222 & .261 & 0.184 & .226 & 0.323\\

Campaneris & 9 & .200 & .258 & 0.177 & .285 & 0.317 \\

Munson & 8 & .178 & .256 & 0.169 & .316 & 0.310 \\

Alvis & 7 & .156 & .253 & 0.161 & .200 & 0.305 \\ \hline

\end{array}

\textrm{Player} & y_i & y_i/n_i & \textrm{EB Estimate} & \textrm{Bootstrap Lower} & \theta_i & \textrm{Bootstrap Upper}\\ \hline

Clemente & 18 & .400 & .280 & 0.231 & .346 & 0.391 \\

F. Robinson & 17 & .378 & .278 & 0.227 & .298 & 0.382 \\

F. Howard & 16 & .356 & .275 & 0.222 & .276 & 0.372 \\

Johnstone & 15 & .333 & .273 & 0.217 & .222 & 0.363 \\

Barry & 14 & .311 & .270 & 0.211 & .273 & 0.355 \\

Spencer & 14 & .311 & .270 & 0.211 & .270 & 0.355 \\

Kessinger & 13 & .289 & .268 & 0.205 & .263 & 0.346 \\

L. Alvarado & 12 & .267 & .266 & 0.199 & .210 & 0.338 \\

Santo & 11 & .244 & .263 & 0.192 & .269 & 0.330\\

Swoboda & 11 & .244 & .263 & 0.192 & .230 & 0.330 \\

Unser & 10 &.222 & .261 & 0.184 & .264 & 0.323 \\

Williams & 10 & .222 & .261 & 0.184 & .256 & 0.323 \\

Scott & 10 & .222 & .261 & 0.184 & .303 & 0.323 \\

Petrocelli & 10 & .222 & .261 & 0.184 & .264 & 0.323 \\

E. Rodriguez & 10 & .222 & .261 & 0.184 & .226 & 0.323\\

Campaneris & 9 & .200 & .258 & 0.177 & .285 & 0.317 \\

Munson & 8 & .178 & .256 & 0.169 & .316 & 0.310 \\

Alvis & 7 & .156 & .253 & 0.161 & .200 & 0.305 \\ \hline

\end{array}

Only one player out of eighteen is not captured by the interval - Thurman Munson, who managed to hit 0.178 in is his first 45 at-bats and 0.316 the rest of the season - for an observed coverage of 94.44%.

(As a reminder, annotated code to perform estimation in the beta-binomial model and calculate bootstrapped empirical Bayesian intervals for the beta-binomial model is available on my github)