Odds
Let's say that instead of a batting average $\theta$, I want the odds of getting a hit. To get the odds of a hit, apply the function
$g(\theta) = \dfrac{\theta}{1-\theta}$
So for example, a batter with a true $\theta = 0.250$ will have odds $g(0.250) = 0.250/0.750 = 1/3$, or one to three odds of getting a hit.
Delta Method
Suppose we have some estimator $\hat{\theta}$ that converges to a normal distribution with variance $\sigma^2$ - that is,
$\hat{\theta} \rightarrow N(\theta, \sigma^2)$
For example, assuming independent and identical at-bats, the sample batting average converges to a normal distribution
$\hat{\theta} \rightarrow N\left(\theta, \dfrac{\theta(1-\theta)}{n}\right)$
then statistical theory says that any function $g(\hat{\theta})$, assuming the first derivative exists and is nonzero, has distribution
$g(\hat{\theta}) \rightarrow N(g(\theta), [g'(\theta)]^2 \sigma^2)$
This gives us a handy way to calculate confidence intervals for functions of parameters, if we can calculate a confidence interval for the parameter itself.
Back to Odds of Getting a Hit
If we define the odds function as above, then the first derivative is given by
$g'(\theta) = \dfrac{1}{(1-\theta)^2}$
and so the distribution of the sample batting odds $g(\hat{\theta})$ converges to a normal distribution with mean $g(\theta)$ and variance
$[g'(\theta)]^2 \sigma^2 = \left[\dfrac{1}{(1-\theta)^2}\right]^2\left[\dfrac{\theta(1-\theta)}{n}\right] = \dfrac{\theta}{n(1-\theta)^3}$
And so a confidence interval for the odds of a hit, given the sample batting average $\hat{\theta}$, is given by
$\left(\dfrac{\hat{\theta}}{1-\hat{\theta}}\right) \pm z^* \sqrt{\dfrac{\hat{\theta}}{n(1-\hat{\theta})^3}}$
where $z^*$ is an appropriate quantile from the normal distribution.
Let's take our batter above, and suppose a $\hat{\theta} = 0.250$ batting average in $n = 40$ at-bats. Then a 95% confidence interval for the odds of getting a hit is given by
$\left(\dfrac{0.250}{1-0.250}\right) \pm1.96 \sqrt{\dfrac{0.250}{40(1 - 0.250)^3}} = (0.095, 0.572)$
A fairly wide interval - but then again, $n = 40$ at-bats isn't much information to work on. If it were instead $n = 400$ at-bats, the interval would be
$\left(\dfrac{0.250}{1-0.250}\right) \pm1.96 \sqrt{\dfrac{0.250}{400(1 - 0.250)^3}} = (0.258, 0.409)$
Which is much smaller, and much more useable.
The code I used to generate these results may be found on my github.
No comments:
Post a Comment