## 20 June, 2015

### The Delta Method for a Confidence Interval for Odds

In my previous post, I discussed using Wald theory and maximum likelihood to get a confidence interval for a batting average, $\theta$. What if I want a function of that parameter instead?

## Odds

Let's say that instead of a batting average $\theta$, I want the odds of getting a hit. To get the odds of a hit, apply the function

$g(\theta) = \dfrac{\theta}{1-\theta}$

So for example, a batter with a true $\theta = 0.250$ will have odds $g(0.250) = 0.250/0.750 = 1/3$, or one to three odds of getting a hit.

## Delta Method

Suppose we have some estimator $\hat{\theta}$ that converges to a normal distribution with variance $\sigma^2$ - that is,

$\hat{\theta} \rightarrow N(\theta, \sigma^2)$

For example, assuming independent and identical at-bats, the sample batting average converges to a normal distribution

$\hat{\theta} \rightarrow N\left(\theta, \dfrac{\theta(1-\theta)}{n}\right)$

then statistical theory says that any function $g(\hat{\theta})$, assuming the first derivative exists and is nonzero, has distribution

$g(\hat{\theta}) \rightarrow N(g(\theta), [g'(\theta)]^2 \sigma^2)$

This gives us a handy way to calculate confidence intervals for functions of parameters, if we can calculate a confidence interval for the parameter itself.

## Back to Odds of Getting a Hit

If we define the odds function as above, then the first derivative is given by

$g'(\theta) = \dfrac{1}{(1-\theta)^2}$

and so the distribution of the sample batting odds $g(\hat{\theta})$ converges to a normal distribution with mean $g(\theta)$ and variance

$[g'(\theta)]^2 \sigma^2 = \left[\dfrac{1}{(1-\theta)^2}\right]^2\left[\dfrac{\theta(1-\theta)}{n}\right] = \dfrac{\theta}{n(1-\theta)^3}$

And so a confidence interval for the odds of a hit, given the sample batting average $\hat{\theta}$, is given by

$\left(\dfrac{\hat{\theta}}{1-\hat{\theta}}\right) \pm z^* \sqrt{\dfrac{\hat{\theta}}{n(1-\hat{\theta})^3}}$

where $z^*$ is an appropriate quantile from the normal distribution.

Let's take our batter above, and suppose a $\hat{\theta} = 0.250$ batting average in $n = 40$ at-bats. Then a 95% confidence interval for the odds of getting a hit is given by

$\left(\dfrac{0.250}{1-0.250}\right) \pm1.96 \sqrt{\dfrac{0.250}{40(1 - 0.250)^3}} = (0.095, 0.572)$

A fairly wide interval - but then again, $n = 40$ at-bats isn't much information to work on. If it were instead $n = 400$ at-bats, the interval would be

$\left(\dfrac{0.250}{1-0.250}\right) \pm1.96 \sqrt{\dfrac{0.250}{400(1 - 0.250)^3}} = (0.258, 0.409)$

Which is much smaller, and much more useable.

The code I used to generate these results may be found on my github.