All the code used to generate the results in this post may be found on my github.
The Multinomial Model
If you observe $n$ independent, identical trials of an event (say, a plate appearance) with $k$ possible outcomes (single, double, triple, home run, walk, sacrifice, etc. - all the way up to an out), then the distribution of counts of each of the possible outcomes follows what's known as a multinomial model. The probability mass function is defined as
$p(x_1, x_2,...,x_k) = \dfrac{n!}{x_1!x_2!...x_k!}\theta_1^{x_1}\theta_2^{x_2}...\theta_{k-1}^{x_{k-1}}(1-\theta_1-\theta_2-...-\theta_{k-1})^{x_k}$
If this looks to you like a binomial, you have a good eye - a binomial is just a multinomial with $k = 2$. As a note, you may if you want define $(1-\theta_1-\theta_2-...-\theta_{k-1}) = \theta_k$ for notation, but it must be defined as shown above for estimation. The log-likelihood function is
$\ell(\theta_1, \theta_2,...,\theta_{k-1}) =\log(n!/(x_1!x_2!...x_k!)) + x_1\log(\theta_1) + ... + x_{k-1}\log(\theta_{k-1}) + x_k \log(1 - \theta_1 - ... - \theta_{k-1})$
Maximizing this function gives maximum likelihood estimates $\hat{\theta_1} = x_1/n$, $\hat{\theta_2} = x_2/n$ - the obvious estimates.
Furthermore, according to statistical theory, the information matrix is the expectation of the negative matrix of second derivatives of the log-likelihood function.
$I(\hat{\theta_1}, \hat{\theta_2},...\hat{\theta_{k-1}}) = -E\left[ \left|\dfrac{\partial^2 \ell}{\partial \theta_i \partial \theta_j} \right| \right]$
And by Wald theory, the inverse of the information matrix gives the variance of the maximum likelihood estimates, which also follows a normal distribution (see my previous post here for a one-dimensional introduction to maximum likelihood and Wald theory). The math gets pretty tedious pretty quick, so I'll just give you the solution - if, for example, $k = 4$ and there are three parameters, then
$\left[ \begin{array}{c} \hat{\theta_1} \\ \hat{\theta_2} \\ \hat{\theta_3} \end{array}\right] \sim N\left( \left[ \begin{array}{c} \theta_1 \\ \theta_2 \\ \theta_3 \end{array}\right] , V(\theta_1, \theta_2, \theta_3)\right)$
$V(\theta_1, \theta_2, \theta_3) = I(\theta_1, \theta_2, \theta_3)^{-1} = \left[ \begin{array}{ccc}
\dfrac{\theta_1(1-\theta_1)}{n} & -\dfrac{\theta_1 \theta_2}{n} & -\dfrac{\theta_1 \theta_3}{n} \\
-\dfrac{\theta_2 \theta_1}{n} & \dfrac{\theta_2(1-\theta_2)}{n} & -\dfrac{\theta_2 \theta_3}{n} \\
-\dfrac{\theta_3 \theta_1}{n} & -\dfrac{\theta_1 \theta_2}{n} & \dfrac{\theta_3(1-\theta_3)}{n}\\
\end{array} \right]$
Notice that for any individual $\theta_i$, the variance is $\dfrac{\theta_i (1-\theta_i)}{n}$ - the same variance as the binomial distribution. This same pattern of binomial variance on the diagonals and $-\dfrac{\theta_i \theta_j}{n}$ continues on the off-diagonals. When doing inference, we will, as usual, replace the $\theta_i$ with $\hat{\theta_i}$, the estimated probabilities.
Weighted Linear Functions of Probabilities
For simplicity, I'm going to assume that $k = 3$ here, so there are only two parameters $\theta_1$ and $\theta_2$ - but the math will work for any $k \ge 2$. Let's say you want to take a weighted function of the parameters.
$g(\theta_1, \theta_2) = w_1 \theta_1 + w_2 \theta_2$
Then the estimated value of $g$ is
$g(\hat{\theta_1}, \hat{\theta_2}) = w_1 \hat{\theta_1} + w_2 {\hat{\theta_2}}$
Where $\hat{\theta_i}$ is given as above. We know from the binomial distribution that $Var(\hat{\theta_1}) = \dfrac{\theta_i (1-\theta_i)}{n}$ - but to get a confidence interval, we need to find
$Var(g(\hat{\theta_1},\hat{\theta_1})) = Var(w_1 \hat{\theta_1} + w_2 {\hat{\theta_2}})$
You may be tempted to take $Var(w_1 \hat{\theta_1} + w_2 \hat{\theta_2}) = w_1^2Var(\hat{\theta_1}) + w_2^2Var(\hat{\theta_2})$, but be careful! The preceding formula only applies when $\hat{\theta}_1$ and $\hat{\theta}_2$ are independent. In situations where they are not, the correct formula is
$Var(w_1 \hat{\theta_1} + w_2 \hat{\theta_2}) = w_1^2 Var(\hat{\theta_1}) + w_2^2 Var(\hat{\theta_2}) + 2 w_1 w_2 Cov(\hat{\theta_1}, \hat{\theta_2})$
you have to take covariance into account. The multinomial model assumes that $n$ is fixed, and so the $\hat{\theta_i}$ are not independent - if you know, for example, that $n = 100$ is fixed and there are $25$ hits, that gives you information about what values the remaining $\theta_i$ can take.
This general formulation - taking variance and covariance into account - is much easier to write out in matrix form. If I let
$\hspace{1in}[w_1 \hspace{0.1in} w_2]$ $\left[ \begin{array}{cc}
\dfrac{\hat{\theta_1}(1-\hat{\theta_1})}{n} & -\dfrac{\hat{\theta_1}\hat{\theta_2}}{n} \\
-\dfrac{\hat{\theta_2} \hat{\theta_1}}{n} & \dfrac{\hat{\theta_2}(1-\hat{\theta_2})}{n} \\
\end{array} \right]$$\left[ \begin{array}{c} w_1 \\ w_2\end{array} \right]$
Weighted On Base Average
(For this example, I am stealing data and context from the fangraphs definition on wOBA. Go check it out if you want to learn more about where the weights come from. As defined in The Book, wOBA also includes events of reaching a base on error - I chose to go with the fangraphs example because my notation was already getting out of control, but the math here will easily expand to include that factor as well.)
If we define $n$ as the number of plate appearances minus the number of intentional walks ($n = PA - IBB$), then we can define a multinomial, counting outcomes of interest - singles, doubles, triples, home runs, unintentional walks, hit by pitches, and "other" - as a multinomial distribution. This means we have 6 parameters - $\theta_{1B}, \theta_{2B}, \theta_{3B}, \theta_{HR}, \theta_{uBB},$ and $\theta_{HBP}$ - representing the probability of each event occurring.
In 2013, Mike Trout had, in $n = 706$ plate appearances (minus IBB), 100 unintentional walks, 9 hit by pitches, 115 singles, 39 doubles, 9 triples, and 27 home runs.
Estimators for each of those components of the model is then $\hat{\theta_{uBB}} = 100/706 = 0.142$, $\hat{\theta_{HBP}} = 9/706 = 0.013$, $\hat{\theta_{1B}} = 0.163$, $\hat{\theta_{2B}} = 0.055$, $\hat{\theta_{3B}} = 0.013$, and $\hat{\theta_{HR}} = 0.038$.
This matrix of linear weights is given by
$w = \left[\begin{array}{cccccc}0.69 & 0.722 & 0.888 & 1.271 & 1.616 & 2.101
\end{array}\right]$
\end{array}\right]$
The full variance matrix $V$ is given by (and I apologize if the formatting comes out weird - there was no way to show this without massive notation, either with greek letters or scientific notation, since the individual elements are so small)
$\left[ \begin{array}{cc}
\dfrac{\hat{\theta_{uBB}}(1-\hat{\theta_{uBB}})}{n} & -\dfrac{\hat{\theta_{uBB}}\hat{\theta_{HBP}}}{n} & -\dfrac{\hat{\theta_{uBB}}\hat{\theta_{1B}}}{n} & -\dfrac{\hat{\theta_{uBB}}\hat{\theta_{2B}}}{n} & -\dfrac{\hat{\theta_{uBB}}\hat{\theta_{3B}}}{n} & -\dfrac{\hat{\theta_{uBB}}\hat{\theta_{HR}}}{n}\\
-\dfrac{\hat{\theta_{HBP}} \hat{\theta_{uBB}}}{n} & \dfrac{\hat{\theta_{HBP}}(1-\hat{\theta_{HBP}})}{n} & -\dfrac{\hat{\theta_{HBP}}\hat{\theta_{1B}}}{n} & -\dfrac{\hat{\theta_{HBP}}\hat{\theta_{2B}}}{n} & -\dfrac{\hat{\theta_{HBP}}\hat{\theta_{3B}}}{n} & -\dfrac{\hat{\theta_{HBP}}\hat{\theta_{HR}}}{n}\\
-\dfrac{\hat{\theta_{1B}} \hat{\theta_{uBB}}}{n} & -\dfrac{\hat{\theta_{1B}} \hat{\theta_{HBP}}}{n} & \dfrac{\hat{\theta_{1B}}(1-\hat{\theta_{1B}})}{n} & -\dfrac{\hat{\theta_{1B}}\hat{\theta_{2B}}}{n} & -\dfrac{\hat{\theta_{1B}}\hat{\theta_{3B}}}{n} & -\dfrac{\hat{\theta_{1B}}\hat{\theta_{HR}}}{n}\\
-\dfrac{\hat{\theta_{2B}}\hat{\theta_{uBB}}}{n} & -\dfrac{\hat{\theta_{2B}}\hat{\theta_{HBP}}}{n} & -\dfrac{\hat{\theta_{2B}}\hat{\theta_{1B}}}{n} &\dfrac{\hat{\theta_{2B}}(1-\hat{\theta_{2B}})}{n} & -\dfrac{\hat{\theta_{2B}}\hat{\theta_{3B}}}{n} & -\dfrac{\hat{\theta_{2B}}\hat{\theta_{HR}}}{n}\\
-\dfrac{\hat{\theta_{3B}} \hat{\theta_{uBB}}}{n} & -\dfrac{\hat{\theta_{3B}}\hat{\theta_{HBP}}}{n} & -\dfrac{\hat{\theta_{3B}}\hat{\theta_{1B}}}{n} & -\dfrac{\hat{\theta_{3B}}\hat{\theta_{2B}}}{n} & \dfrac{\hat{\theta_{3B}}(1-\hat{\theta_{3B}})}{n} & -\dfrac{\hat{\theta_{3B}}\hat{\theta_{HR}}}{n}\\
-\dfrac{\hat{\theta_{HR}} \hat{\theta_{uBB}}}{n} & -\dfrac{\hat{\theta_{HR}} \hat{\theta_{HBP}}}{n} & -\dfrac{\hat{\theta_{HR}}\hat{\theta_{1B}}}{n} & -\dfrac{\hat{\theta_{HR}}\hat{\theta_{2B}}}{n} & -\dfrac{\hat{\theta_{HR}}\hat{\theta_{3B}}}{n} &\dfrac{\hat{\theta_{HR}}(1-\hat{\theta_{HR}})}{n} \end{array}\right]$
And the standard deviation of wOBA for Mike Trout that year was given by
$\sqrt{wVw^T} = 0.021$
A 95% confidence interval for the wOBA of Mike Trout in 2013 is given by
$0.423 \pm 1.96(0.021) = (0.381,0.461)$
Last Word
These intervals, complicated as they are, are actually too narrow. The formula treats the weights $w_i$ as constants, but really, they are estimates too - $w_i = \hat{w_i}$ - and the formulation does not account for $Var(\hat{w_i})$, the uncertainty in the estimation of the $w_i$. With as much data as has gone into the calculation of the weights, $Var(\hat{w_i})$ is negligible, and can be safely ignored..
No comments:
Post a Comment