## 02 April, 2021

### 2021 Stabilization Points

These are my estimated stabilization points for the 2021 MLB season, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles  Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model.

However, good news! In the past two years, I've had some research on reliability for non-normal data corrected, expanded upon, and published in academic journals. I can definitively say that my maximum likelihood estimator is accurately estimating the reliability of these statistics exactly the same as Cronbach's alpha or KR-20 and performs as well or better than Cronbach's alpha, assuming the model is correct, which - while no model is correct - I believe is very accurate. The article can be found here (for the preprint, click here). I also published a paper with some KR-20 and KR-21 reliability estimators specifically for exponential family distributions such as binomial, Poisson, etc. The article can be found here (for the preprint, click here). These estimators are a little more efficient for small sample sizes but for large sample sizes such as in this case, however, the estimators should be nearly identical.

As usual, all data and code I used for this post can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.

I've included standard error estimates for 2021, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here. The 2018 calculations can be found here. The 2019 calculations can be found here. I didn't do calculations in 2020 because of the pandemic in general.

The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2015-20120), were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

## Offensive Statistics

\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2019\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 302.57 & 18.39 & 0.331 & 300 & 295.20 \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 451.24 &  47.22 & 0.306 & 300 & 431.49 \\
\textrm{BA}&\textrm{H/AB} & 511.71 & 42.78 & 0.265 & 300 & 488.49 \\
\textrm{SO Rate}&\textrm{SO/PA} & 50.37 & 2.12  & 0.205 & 300 & 49.05 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 100.47 & 4.67 & 0.080 & 300 & 104.08 \\
\textrm{1B Rate}&\textrm{1B/PA} & 191.17 & 10.20 & 0.150 & 300 & 197.43 \\
\textrm{2B Rate}&\textrm{2B/PA} & 1242.67 & 162.27 & 0.047 & 300 & 1200.46 \\
\textrm{3B Rate}&\textrm{3B/PA} & 481.11 & 28.74 & 0.005 & 300 & 421.91 \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1059.31 & 124.09 & 0.052 & 300 & 1070.09 \\
\textrm{HR Rate} & \textrm{HR/PA} & 146.00 & 7.68 & 0.034 & 300 & 141.80\\
\textrm{HBP Rate} & \textrm{HBP/PA} & 261.13 & 16.56 & 0.010 & 300 & 266.92 \\ \hline
\end{array}

In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

The stabilization point of the 3B rate increased dramatically by approximately two standard deviations, indicating that the talent level of hitting triples has clustered more closely around its mean. In general, however, most stabilization points are roughly the same as the previous year, taking into account that year-to-year and sample-to-sample variation in estimates is expected even if the true stabilization points are not changing.

## Pitching Statistics

\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}& 2019 \textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1061.43 & 197.34 & 0.286 &300& 1184.38 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 66.20 & 4.25 & 0.443 &300& 64.51\\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 62.33 & 3.97 & 0.346 &300& 60.68 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1773.66 & 486.12 & 0.211 &300& 2197.02  \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 529.40 & 129.10 & 0.130 & 100 & 351.53 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 80.78 & 4.97 & 0.214 &400& 90.86 \\
\textrm{HR Rate}&\textrm{HR/TBF}& 959.57 & 133.073 & 0.031 &400& 764.48\\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 251.22 & 19.47 & 0.072 & 400 & 230.09 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 1035.90 & 153.68 & 0.009 &400& 906.25 \\
\textrm{Hit rate}&\textrm{H/TBF}& 453.30 & 37.52 & 0.232 &400& 496.56   \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 407.36 & 36.33 & 0.313 &400& 443.60 \\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 63.38 & 4.79 & 1.29 &80& 67.84 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 57.73 & 4.30 & 0.460 &80& 57.97 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 64.70 & 4.92 & 1.23 &80& 67.23 \\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively.

Most are the same, but the HR/FB stabilization point has shifted up dramatically given its standard error, indicating a likely change in true talent level and not just sample-to-sample and year-to-year variation. This indicates that the distribution of HR/FB talent levels is clustering around its mean, possibly indicating a change in approach by pitchers or batters over the past two years. The mean has also shifted up over the previous calculation. Similarly, the HR rate stabilization point and mean have increased. Conversely, the strikeout rate stabilization rate has decreased, indicating less clustering of talent levels around the mean, and the mean has also increased.

## Usage

Aside from the obvious use of knowing approximately when results are half due to luck and half  skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation.

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.331*302.57}{300 + 302.57} \pm 1.96 \sqrt{\dfrac{0.331(1-0.331)}{302.57 + 300}} = (0.318,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.331*302.57}{300 + 302.57} \pm 1.96 \sqrt{\dfrac{0.331(1-0.331)}{302.57+ 300} + \dfrac{0.331(1-0.331)}{250}} = (0.286,0.425)$

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% "unshrunk" intervals, but when I tested them in my article "From Stabilization to Interval Estimation," they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h\$$BB + h$\$$HBP, h\$$PA, cutoff = 300, 1)
$\$$Parameters [1] 0.3306363 302.5670532 \$$Standard.Error [1] 18.38593$\$$Model [1] "Beta-Binomial" The confidence bounds are created from the estimates \hat{M} and SE(\hat{M}) above and the formula \left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M}) which is obtained from the applying the delta method to the function p(\hat{M}) = n/(n + \hat{M}). Note that the mean and prediction intervals I gave do not take SE(\hat{M}) into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of M into the formulas yourself and see that the resulting intervals do not change much. As always, feel free to post any comments or suggestions. ## 21 April, 2019 ### 2019 Stabilization Points These are my estimated stabilization points for the 2019 MLB season, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model. (As usual, all data and code I used can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.) I've included standard error estimates for 2019, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here. The 2018 calculations can be found here. The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2013-2018), were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice. ## Offensive Statistics \begin{array}{| l | l | c | c | c | c | c | c |} \hline \textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2018\textrm{ }\hat{M} \\ \hline \textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 295.20 & 16.26 & 0.329 & 300 & 302.27 \\ \textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 431.49 & 39.76 & 0.306 & 300 & 429.47 \\ \textrm{BA}&\textrm{H/AB} & 488.49 & 36.52 & 0.264 & 300 & 463.19 \\ \textrm{SO Rate}&\textrm{SO/PA} & 49.05 & 1.88 & 0.198 & 300 & 48.74 \\ \textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 104.08 & 4.45 & 0.078 & 300 & 108.84 \\ \textrm{1B Rate}&\textrm{1B/PA} & 197.43 & 9.72 & 0.154 & 300 & 200.94 \\ \textrm{2B Rate}&\textrm{2B/PA} & 1200.46 & 140.37 & 0.047 & 300 & 1164.82 \\ \textrm{3B Rate}&\textrm{3B/PA} & 421.91 & 31.67 & 0.005 & 300 & 390.75 \\ \textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1070.09 & 115.96 & 0.052 & 300 & 1064.01 \\ \textrm{HR Rate} & \textrm{HR/PA} & 141.80 & 6.78 & 0.030 & 300 & 132.52 \\ \textrm{HBP Rate} & \textrm{HBP/PA} & 266.92 & 15.74 & 0.009 & 300 & 280.00 \\ \hline \end{array} In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels. Noticeably, the stabilization point for the HR rate has increased over the past four years, indicating less variance in talent level of hitting home runs. Meanwhile, the stabilization point for HBP rate has decreased over the past four years, suggesting increased variance in """talent""" level of getting hit by pitches. ## Pitching Statistics \begin{array}{| l | l | c | c | c | c | c | c |} \hline \textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2018 \textrm{ }\hat{M} \\ \hline \textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1184.38 & 206.63& 0.288 &300&1322.70 \\ \textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 64.51 & 3.66 & 0.446 &300&63.12 \\ \textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}&60.68 &3.41 & 0.344 &300&59.80 \\ \textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 2197.02 & 622.02 & 0.210 &300&2157.15 \\ \textrm{HR/FB Rate}&\textrm{HR/FB}& 351.53 & 56.05 & 0.117 & 100 & 388.61 \\ \textrm{SO Rate}&\textrm{SO/TBF}& 90.86 &5.07& 0.204&400&93.52 \\ \textrm{HR Rate}&\textrm{HR/TBF}&764.48& 82.78 & 0.028 &400&790.97 \\ \textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 230.09 & 15.46 & 0.071 &400&238.70 \\ \textrm{HBP Rate}&\textrm{HBP/TBF}& 906.25 & 109.63 & 0.009 &400&935.61 \\ \textrm{Hit rate}&\textrm{H/TBF}&496.56 & 39.48 & 0.233 &400&536.32 \\ \textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 443.60 & 36.42 & 0.312 &400& 472.09 \\ \textrm{WHIP}&\textrm{(H + BB)/IP*}&67.84 & 4.69 & 1.28 &80& 71.10 \\ \textrm{ER Rate}&\textrm{ER/IP*}& 57.97 & 3.87 & 0.444 &80& 58.59 \\ \textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 67.23 & 4.64 & 1.22 &80& 69.11 \\ \hline \end{array} * When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. Most statistics this year shifted not just in stabilization point, but also in mean, possibly indicating a shift in the pitching environment. The stabilization points which did shift tended to shift down, indicating an increased spread of variation around the mean talent levels. ## Usage Aside from the obvious use of knowing approximately when results are half due to luck and half skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation. For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is \dfrac{114 + 0.329*295.20}{300 + 295.20} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{295.20 + 300}} = (0.317,0.392) That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals. Suppose that the player will get an additional \tilde{n} = 250 PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by \dfrac{114 + 0.329*295.20}{300 + 295.20} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{295.20 + 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424) That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% "unshrunk" intervals, but when I tested them in my article "From Stabilization to Interval Estimation," they worked well for prediction. As usual, all my data and code can be found on my github. I wrote a general function in R to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds. > stabilize(h\$$H + h$\$$BB + h\$$HBP, h$\$$PA, cutoff = 300, 1) \$$Parameters
[1]   0.3285272 295.1970047

[1] 16.92138

$\$$Model [1] "Beta-Binomial" The confidence bounds are created from the estimates \hat{M} and SE(\hat{M}) above and the formula \left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M}) which is obtained from the applying the delta method to the function p(\hat{M}) = n/(n + \hat{M}). Note that the mean and prediction intervals I gave do not take SE(\hat{M}) into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of M into the formulas yourself and see that the resulting intervals do not change much. Maybe somebody else out there might find this useful. As always, feel free to post any comments or suggestions! ## 24 April, 2017 ### 2017 Stabilization Points Once again, I recalculated stabilization points for 2017 MLB data, once again using the maximum likelihood method on the totals that I used for 2015 and 2016. This method is explained in my articles Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model. (As usual, all data and code I used can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.) I've included standard error estimates for 2017, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, are those values are estimates themselves with their own standard errors and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data, were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice. ## Offensive Statistics \begin{array}{| l | l | c | c | c | c | c | c |} \hline \textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2016\textrm{ }\hat{M} \\ \hline \textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 303.77 & 17.08 & 0.328 & 300 & 301.32 \\ \textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 442.62 & 40.55 & 0.306 & 300 & 433.04\\ \textrm{BA}&\textrm{H/AB} & 466.09 & 34.30 & 0.266 & 300 & 491.20\\ \textrm{SO Rate}&\textrm{SO/PA} & 49.02 & 1.90 & 0.188 & 300 & 49.23\\ \textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 113.64 & 5.00 & 0.077 & 300 & 112.44 \\ \textrm{1B Rate}&\textrm{1B/PA} & 215.29 & 10.95 & 0.157 & 300 & 223.86 \\ \textrm{2B Rate}&\textrm{2B/PA} & 1230.96 & 148.48 & 0.047 & 300 & 1169.75 \\ \textrm{3B Rate}&\textrm{3B/PA} & 358.92 & 25.71 & 0.005 & 300 & 365.06 \\ \textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1063.76 & 116.54 & 0.052 & 300 & 1075.41 \\ \textrm{HR Rate} & \textrm{HR/PA} & 129.02 & 6.18 & 0.028 & 300 & 126.35\\ \textrm{HBP Rate} & \textrm{HBP/PA} & 299.39 & 18.60 & 0.009 & 300 & 300.97 \\ \hline \end{array} In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels. ## Pitching Statistics \begin{array}{| l | l | c | c | c | c | c | c |} \hline \textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2016 \textrm{ }\hat{M} \\ \hline \textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1356.06 & 247.48 & 0.289 &300&1408.72\\ \textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 64.00 & 3.56 & 0.450 &300& 63.53 \\ \textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 61.73 &3.42& 0.342 &300&59.80 \\ \textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1497.65 & 296.21 & 0.208 &300&731.02 \\ \textrm{HR/FB Rate}&\textrm{HR/FB}& 464.60 & 85.51 & 0.108 &100&488.53 \\ \textrm{SO Rate}&\textrm{SO/TBF}& 94.62&5.29& 0.194&400&93.15 \\ \textrm{HR Rate}&\textrm{HR/TBF}& 942.62 & 110.66 & 0.026 &400&949.02 \\ \textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 237.53 & 15.84 & 0.069 &400&236.87 \\ \textrm{HBP Rate}&\textrm{HBP/TBF}& 954.09 & 115.60 & 0.008 &400&939.00 \\ \textrm{Hit rate}&\textrm{H/TBF}& 550.69 & 45.63 & 0.235 &400&559.18 \\ \textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 496.39 & 41.81 & 0.312 &400&526.77 \\ \textrm{WHIP}&\textrm{(H + BB)/IP*}& 74.68 & 5.25 & 1.29 &80&78.97 \\ \textrm{ER Rate}&\textrm{ER/IP*}& 62.82 & 4.24 & 0.440 &80&63.08 \\ \textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 73.11& 5.11 & 1.22 &80&75.79 \\ \hline \end{array} * When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. Most statistics are roughly the same; however, the line drive stabilization point has roughly doubled. I checked my calculations for both years and this is not a mistake. It corresponds to a decrease in the variance of line drive rates. It should also be noted that the average line drive rate increased from 0.203 to 0.208 - there are perhaps remnants of an odd 2010 that is no longer included in the data set. ## Usage Note: This section is largely unchanged from the previous year's version. The formulas given here work for "counting" offensive stats (OBP, BA, etc.). Aside from the obvious use of knowing approximately when results are half due to luck and half from skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation. I believe the confidence intervals from my method should be similar to the intervals from Sean Dolinar's great fangraphs article A New Way to Look at Sample Size, though I have not personally tested this, and am not familiar with the Cronbach's alpha methodology he uses (or with reliability analysis in general). For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is \dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300}} = (0.317,0.392) That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals. Suppose that the player will get an additional \tilde{n} = 250 PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by \dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424) That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% unshrunk intervals, but when I tested them in my article "From Stabilization to Interval Estimation", they worked well for prediction. As usual, all my data and code can be found on my github. I wrote a general function in R to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA ). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds. > stabilize(h\$$H + h$\$$BB + h\$$HBP, h$\$$PA, cutoff = 300, 1) \$$Parameters [1] 0.329098 301.317682$\$$Standard.Error [1] 16.92138 \$$Model
[1] "Beta-Binomial"

The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

Maybe somebody else out there might find this useful. As always, feel free to post any comments or suggestions!

## 03 September, 2016

### 2016 Win Total Predictions (Through August 31)

These predictions are based on my own silly estimator, which I know can be improved with some effort on my part.  There's some work related to this estimator that I'm trying to get published academically, so I won't talk about the technical details yet (not that they're particularly mind-blowing anyway). These predictions include all games played before through August 31 break.

As a side note, I noticed that my projections are very similar to the Fangraphs projections on the same day. I'm sure we're both calculating the projections from completely different methods, but it's reassuring that others have arrived at basically the same conclusions. Theirs have also have playoff projections, though mine have intervals attached to them.

I set the nominal coverage at 95% (meaning the way I calculated it the intervals should get it right 95% of the time), but based on tests of earlier seasons at this point in the season the actual coverage is around 94%, with intervals usually being one game off if and when they are off.

Intervals are inclusive. All win totals assume a 162 game schedule.

\begin{array} {c c c c}
\textrm{Team}  & \textrm{Lower}  & \textrm{Mean} & \textrm{Upper} & \textrm{True Win Total}  & \textrm{Current Wins/Games}\\ \hline
ARI & 63 & 68.82 & 74 & 71.61 & 56 / 133 \\
ATL & 57 & 62.25 & 68 & 68.41 & 50 / 133 \\
BAL & 81 & 86.57 & 92 & 81.42 & 72 / 133 \\
BOS & 85 & 90.41 & 96 & 91.7 & 74 / 133 \\
CHC & 98 & 103.63 & 109 & 100.59 & 85 / 132 \\
CHW & 71 & 76.85 & 83 & 77.61 & 62 / 131 \\
CIN & 62 & 67.9 & 74 & 69.67 & 55 / 132 \\
CLE & 87 & 92.68 & 98 & 90.03 & 76 / 132 \\
COL & 73 & 78.63 & 84 & 81.72 & 64 / 133 \\
DET & 81 & 86.95 & 92 & 83.51 & 72 / 133 \\
HOU & 81 & 86.24 & 92 & 85.14 & 71 / 133 \\
KCR & 78 & 83.09 & 89 & 78.69 & 69 / 133 \\
LAA & 67 & 72.93 & 78 & 77.8 & 59 / 133 \\
LAD & 84 & 89.44 & 95 & 86.26 & 74 / 133 \\
MIA & 76 & 81.66 & 87 & 81.91 & 67 / 133 \\
MIL & 65 & 70.13 & 76 & 73.34 & 57 / 133 \\
MIN & 56 & 61.98 & 68 & 70.1 & 49 / 132 \\
NYM & 78 & 83.61 & 89 & 81.58 & 69 / 133 \\
NYY & 78 & 83.86 & 90 & 80.28 & 69 / 132 \\
OAK & 64 & 69.88 & 75 & 71.92 & 57 / 133 \\
PHI & 67 & 72.42 & 78 & 69.38 & 60 / 133 \\
PIT & 77 & 82.37 & 88 & 80.33 & 67 / 131 \\
SDP & 63 & 68.73 & 74 & 74.14 & 55 / 132 \\
SEA & 77 & 82.55 & 88 & 81.3 & 68 / 133 \\
SFG & 82 & 88 & 94 & 86.39 & 72 / 132 \\
STL & 81 & 86.25 & 92 & 87.69 & 70 / 132 \\
TBR & 65 & 70.72 & 76 & 79.47 & 56 / 132 \\
TEX & 89 & 94.45 & 100 & 83.61 & 80 / 134 \\
TOR & 86 & 91.93 & 97 & 89 & 76 / 133 \\
WSN & 89 & 94.82 & 100 & 94.01 & 78 / 133 \\
\hline\end{array}

These quantiles are based off of a distribution - I've uploaded a picture of each team's distribution to imgur. The bars in red are the win total values covered by the 95% interval. The blue line represents my estimate of the team's "True Win Total" based on its performance - so if the blue line is to the left of the peak, the team is predicted to finish "lucky" - more wins than would be expected based on their talent level - and if the blue line is to the right of the peak, the team is predicted to finish "unlucky" - fewer wins that would be expected based on their talent level.

It's still difficult to predict final win totals even at the beginning of September - intervals have a width of approximately 11-12 games. The Texas Ranges have been lucky this season, with a projected win total over 10 games larger than their estimated true talent level!  Conversely, the Tampa Bay Rays have been unlucky, with a projected win total 10 games lower than their true talent level.

The Chicago Cubs have a good chance at winning 105+ games. My system believes they are a "true" 101 win team. Conversely, the system believes that the worst team is the Atlanta Braves, which are a "true" 68 win team (though the Minnesota Twins are projected to have the worst record at 62 wins).

## Terminology

To explain the difference between "Mean" and "True Win Total"  - imagine flipping a fair coin 10 times. The number of heads you expect is 5 - this is what I have called "True Win Total," representing my best guess at the true ability of the team over 162 games. However, if you pause halfway through and note that in the first 5 flips there were 4 heads, the predicted total number of heads becomes $4 + 0.5(5) = 6.5$ - this is what I have called "Mean", representing the expected number of wins based on true ability over the remaining schedule added to the current number of wins (from the beginning of the season until the all-star break).

## 15 August, 2016

### wOBA Shrinkage Estimation by the Multinomial-Dirichilet Model

In a previous article, I showed how to calculate a very basic confidence interval for wOBA using the multinomial model. Since then, I've shown how to perform shrinkage estimation (regression towards the mean) for basic counting stats such as BA and OBP and rate stats such as WHIP. In this article, I'm going to show how to use the multinomial model with a Dirichlet prior to find a regressed estimate of wOBA (and other functions that are linear transformations of counting stats).

As in the previous post, I will use the weights of wOBA from the definition on fangraphs.com. I am aware that in The Book wOBA also includes the outcome of reaching a base on error, but the method here will easily expand to include that factor and the results should not change drastically for its exclusion.

As usual, all the code and data I use can be found on my github.

## The Multinomial and Dirichlet Models

Suppose you observe $n$ independent, identical trials of an event (say, a plate appearance or an at-bat) with $k$ possible outcomes (single, double, triple, home run, walk, sacrifice fly, etc. - all the way up to an out). The distribution of counts of each of the possible outcomes is multinomial with probability mass function:

$p(x_1, x_2,...,x_k | \theta_1, \theta_2, ..., \theta_{k}, n) = \dfrac{n!}{x_1!x_2!...x_k!}\theta_1^{x_1}\theta_2^{x_2}...\theta_k^{x_k}$

Where $x_1$, $x_2$,..., $x_k$ represent counts of each outcome (with $n = x_1 + x_2 + ... + x_k$ fixed) and $\theta_1, \theta_2, ..., \theta_k$ represent the probability of each outcome in a single event - note that all the probabilities $\theta_j$ must sum to 1, so you will sometimes see the last term written as $\theta_k = (1-\theta_1-\theta_2-...-\theta_{k-1})$.

To give an example, suppose that in each plate appearance (the event) a certain player has a 0.275 chance of getting a hit, a 0.050 chance of getting a walk, and a 0.675 chance of an "other" outcome happening (meaning anything other than a hit or walk - an out, hit by pitch, reach base on error, etc.). Then in $n = 200$ plate appearances, the probability of exactly $x_H = 55$ hits, $x_{BB} = 10$ walks, and $x_{OTH} = 135$ other outcomes is given by

$\dfrac{200!}{55!10!135!} 0.275^{55} 0.05^{10} 0.675^{135} = 0.008177562$

(This probability is necessarily small because there are a little over 20,000 ways to have the three outcomes sum up to 200 plate appearances. In fact, 55 hits, 10 walks, and 135 other is the most probable set of outcomes.)

The multinomial is, as its name implies, a multivariate extension of the classic binomial distribution (a binomial is just a multinomial with $k = 2$). Similarly, there is a multivariate extension of the beta distribution called the Dirichlet distribution. The Dirichlet is used to represent the joint distribution of the $\theta_j$ themselves - that is, the joint distribution (over the entire league) of sets of talent levels for each of the $k$ possible outcomes. The probability density function of the Dirichlet is

$p(\theta_1, \theta_2, ..., \theta_{k} | \alpha_1, \alpha_2, ..., \alpha_k) = \displaystyle \dfrac{\prod_{j = 1}^k \Gamma(\alpha_j)}{\Gamma(\sum_{j = 1}^k \alpha_j)} \theta_1^{\alpha_1 - 1} \theta_2^{\alpha_2 - 1}...\theta_k^{\alpha_k - 1}$

The Dirichlet distribution would be used to answer the question "What is the probability that a player has a hit probability of between 0.250 and 0.300 and simultaneously a walk probability of between 0.50 and 0.100?"- the advantage of doing this is being able to model the covariance between the talent levels.

The expected values of each of the $\theta_j$ are given by

$E[\theta_j] = \dfrac{\alpha_j}{\alpha_0}$

where

$\alpha_0 = \displaystyle \sum_{j = 1}^k \alpha_j$

These represent the league average talent levels in each of the outcomes. So for example, using hits (H), walks (BB), and other (OTH), the quantity given by

$E[\theta_{BB}] = \dfrac{\alpha_{BB}}{\alpha_H + \alpha_{BB} + \alpha_{OTH}}$

would be the average walk proportion (per PA) over all MLB players.

The reason the Dirichlet distribution is useful is that it is conjugate to the multinomial density above. Given raw counts $x_1, x_2, ..., x_k$ for each outcome in the multinomial model and parameters $\alpha_1$, $\alpha_2$, ..., $\alpha_k$ for the Dirichlet model, the posterior distribution for the $\theta_i$ is also Dirichlet with parameters $\alpha_1 + x_1, \alpha_2 + x_2, ..., \alpha_k + x_k$:

$p(\theta_1, \theta_2, ..., \theta_{k} | x_1, x_2, ..., x_k) =$

$\displaystyle \dfrac{\prod_{j = 1}^k \Gamma(\alpha_j + x_i)}{\Gamma(\sum_{j = 1}^k \alpha_j + x_j)} \theta_1^{\alpha_1 + x_1- 1} \theta_2^{\alpha_2 + x_2- 1}...\theta_{k-1}^{\alpha_{k-1} + x_{k-1} - 1}\theta_k^{\alpha_k + x_k - 1}$

For the posterior, the expected value for each outcome is given by

$E[\theta_j] = \dfrac{\alpha'_j}{\alpha'_0}$

where

$\alpha'_j = x_j + \alpha_j$

$\alpha'_0 = \displaystyle \sum_{j = 1}^k \alpha'_j = \sum_{j = 1}^k (x_j + \alpha_j)$

These posterior means $E[\theta_j]$ represent regressed estimates for each of the outcome talent levels towards the league means. These shrunk estimates can then be plugged in to the formula for any statistic to get a regressed version of that statistic.

## Linearly Weighted Statistics

Most basic counting statistics (such as batting average, on-base percentage, etc.) simply try to estimating one particular outcome using the raw proportion of events ending in that outcome:

$\hat{\theta_j} \approx \dfrac{x_j}{n}$

More advanced statistics instead attempt to estimate linear functions of true talent levels $\theta_j$ with weights $w_j$ for each outcome:

$w_1 \theta_1 + w_2\theta_2 + ... + w_k \theta_k$

The standard version of these statistics that you can find on any number of baseball sites uses the raw proportion $x_j/n$ as an estimate $\hat{\theta}_j$ as above. To get the regressed version of the statistic, use $\hat{\theta_j} = E[\theta_j]$ from the posterior distribution for the Dirichlet-multinomial model - the formula for the regressed statistic is then

$w_1 \hat{\theta}_1 + w_2 \hat{\theta_2} + ... + w_k \hat{\theta_k} = \displaystyle \sum_{j = 1}^k w_j \left(\dfrac{x_j + \alpha_j}{\sum_{j = 1}^k x_j + \alpha_j}\right) =\sum_{j = 1}^k w_j \left(\dfrac{\alpha_j'}{\alpha_0'}\right)$

(The full posterior distribution can also be used to get interval estimates for the statistic, which will be the focus of the next article)

## Estimation

This raises the obvious question of what values to use for the $\alpha_j$ in the Dirichlet distribution - ideally, $\alpha_j$ should be picked so that the Dirichlet distribution accurately modes the joint distribution of talent levels for MLB hitters. There are different ways to do this, but I'm going to use use MLB data itself and a marginal maximum likelihood technique to find estimates $\hat{\alpha_j}$ and plug those into the equations above - this method was chosen because there are existing R packages to do the estimation and it is relatively numerically stable and should be very close to the results of other methods for large sample sizes. Using the data to find estimates of the $\alpha_j$ and then plugging those back in makes this an empirical Bayesian technique.

First, it's necessary to get rid of the talent levels $\theta_j$ and find the probability of the observed data based not on a particular player's talent level(s), but on the Dirichlet distribution of population talent levels itself. This is done by integrating out each talent level:

$p(x_1, x_2, ..., x_k | \alpha_1, \alpha_2, ..., \alpha_k, n) =$
$\displaystyle \int_{\tilde{\theta}} p(x_1, x_2, ..., x_k | \theta_1, \theta_2, ..., \theta_k, n) p(\theta_1, \theta_2, ..., \theta_k | \alpha_1, \alpha_2, ..., \alpha_k, n) d\tilde{\theta}$

where $\tilde{\theta}$ indicates the set of all $\theta_j$ values - so integrating out the success probability for each outcome, from 0 to 1.

The calculus here is a bit tedious, so skipping straight to the solution, this gives probability mass function:

$= \displaystyle \dfrac{n! \Gamma(\sum_{j = 1}^k \alpha_j)}{\Gamma(n + \sum_{j = 1}^k \alpha_j)} \prod_{i = j}^k \dfrac{\Gamma(x_j + \alpha_j)}{x_j! \Gamma(\alpha_j)}$

This distribution, known as the Dirichlet-Multinomial distribution, represents the probability of getting $x_1, x_2, ..., x_k$ outcomes in fixed $n = x_1 + x_2 + ... + x_k$ events, given only information about the population. Essentially, this distribution would be used to answer the question "What's the probability that, if I select a player from the population of all MLB players completely at random - so not knowing the player's talent levels at all - the player gets $x_1$ singles, $x_2$ doubles, etc., in $n$ plate appearances?"

Using $x_{i,j}$ to represent the raw count for outcome $j$ for player $i = 1, 2, ..., N$ and $\tilde{x}_i$ as shorthand to represent the complete set of counting statistics for player $i$, with the counting stats of multiple players $\tilde{x}_1, \tilde{x}_2, ..., \tilde{x}_N$  from some population (such as all MLB players that meet some event threshold), statistical estimation procedures can be used to acquire estimates $\hat{\alpha}_j$ of the true population parameters $\alpha_j$.

For the maximum likelihood approach, the log-likelihood of a set of estimates $\tilde{\alpha}$ is given by

$\ell(\tilde{\alpha} | \tilde{x}_1, ..., \tilde{x}_N) = \displaystyle \sum_{i = 1}^N [\log(n_i!) + \log(\Gamma(\sum_{j = 1}^k \alpha_j)) - \log(\Gamma(n + \sum_{j = 1}^k \alpha_j))$
$+ \displaystyle \sum_{j = 1}^k \log(\Gamma(x_{i,j} + \alpha_j)) - \sum_{j = 1}^k \log(x_{i,j}!) - \sum_{j = 1}^k \log(\Gamma(\alpha_j))]$

The maximum likelihood method works by finding the values of the $\alpha_j$ that maximize  $\ell(\tilde{\alpha})$  above - these are the maximum likelihood estimates $\hat{\alpha}_j$. From a numerical perspective, doing this is not simple, and papers have been written on fast, easy ways to perform the computations. For simplicity, I'm going to use the dirmult package in R, which only requires the set of counts for each outcome as a matrix where each row corresponds to exactly one player. The dirmult package can be installed with the command

> install.packages('dirmult')

Once the data is entered and estimation is performed, you will have estimates $\hat{\alpha}_j$. These can then be plugged into the posterior equations above to get regressed statistic estimate

$w_1 \hat{\theta}_1 + w_2 \hat{\theta_2} + ... + w_k \hat{\theta_k} = \displaystyle \sum_{j = 1}^k w_j \left(\dfrac{x_j + \hat{\alpha}_j}{\sum_{j = 1}^k x_j + \hat{\alpha}_j}\right)$

I'll give two examples of offensive stats that can be regressed in this way.

## wOBA Shrinkage

The first is weighted on-base average (which I'll call wOBA for short), introduced in Tom Tango's The Book, though as previously mentioned, I am using the fangraphs.com definition.  For events, at-bats plus unintentional walks, sacrifice flies, and times hit by a pitch will be used (n = AB + BB - IBB + SF + HBP) , and seven outcomes are defined - singles (1B), doubles (2B), triples (3B), home runs (HR), unintentional walks (BB), and hit by pitch (HBP), with everything else being lumped into an "other" (OTH) outcome.

For notation, again let $x_{i, 1B}$, $x_{i, 2B}$, ..., $x_{i, OTH}$ represent the number of singles, doubles, etc. for player $i$  (abbreviating the entire set as $\tilde{x}_i$) and let  $\theta_{i, 1B}$, $\theta_{i, 2B}$, ..., $\theta_{i, OTH}$ represent the true probability of getting a single, double, etc. for player $i$ (abbreviating the entire set as $\tilde{\theta}_i$). The total number of events for player $i$ is given by $n_i$.

Data was collected from fangraphs.com on all MLB non-pitchers from 2010 - 2015. A cutoff of 300 events was used - so only players with at least 300 total AB + BB - IBB + SF + HBP in a given season were used. The code and data I used may be found in my github.

A player's wOBA can be written as a linear transformation of the $\theta_j$ for each of these outcomes with weights $w_{1B} = 0.89$, $w_{2B} = 1.27$, $w_{3B} = 1.62$, $W_{HR} = 2.10$, $w_{BB} = 0.69$, $w_{HBP} = 0.72$, and $w_{OTH} = 0$ as

$wOBA_i = 0.89*\theta_{i,1B} + 1.27*\theta_{i,2B} + 1.62*\theta_{i,3B} + 2.10*\theta_{i,HR} + 0.69*\theta_{i,BB}+0.72*\theta_{i,HBP}$

For player $i$, the distribution of the counts $\tilde{x}_i$ in $n_i$ events is multinomial with mass function

$p(\tilde{x}_i | \tilde{\theta}_i, n_i) =\dfrac{n_i!}{x_{i,1B} x_{i,2B}! x_{i,}! x_{i,HR}! x_{i,OTH}! }\theta_{1B,i}^{x_{i,1B}} \theta_{i,2B}^{x_{i,2B}} \theta_{i,3B}^{x_{i,3B}} \theta_{i,HR}^{x_{i,HR}} \theta_{i,OTH}^{x_{i,OTH}}$

The  joint distribution of possible talent levels $\tilde{\theta}$ is assumed to be Dirichlet.

$p(\tilde{\theta}_i | \tilde{\alpha}) = \displaystyle \dfrac{\prod_{j = 1}^k \Gamma(\alpha_j)}{\Gamma(\sum_{j = 1}^k \alpha_j)} \theta_{i,1B}^{\alpha_{1B} - 1} \theta_{i,2B}^{\alpha_{2B} - 1}\theta_{i,3B}^{\alpha_{3B} - 1} \theta_{i,HR}^{\alpha_{HR} - 1}\theta_{i,BB}^{\alpha_{BB} - 1}\theta_{i,HBP}^{\alpha_{HBP} - 1}\theta_{i,OTH}^{\alpha_{OTH} - 1}$

To find the maximum likelihood estimates $\hat{\alpha}_j$ for this model using the dirmult package in R, the data needs to be loaded into a matrix, where row $i$ represents the raw counts for each outcome for player $i$.  There are any number of way to do this, but the first 10 rows of the matrix (out of 1598 in the sample total in the data set used for this example) should look something like:

> x[1:10,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   91   38    1   42  109    5  353
[2,]  122   26    1   44   71    5  364
[3,]   58   25    1   18   43    0  196
[4,]  111   40    3   32   38    5  336
[5,]   63   25    0   30   56    3  253
[6,]   67   18    1   21   46    5  213
[7,]   86   24    2   43  108    6  362
[8,]   58   25    2   20   24    3  201
[9,]   35   16    2   25   56    2  200
[10,]   68   44    0   14   76    5  250

Once the data is in this form,  finding the maximum likelihood estimates can be done with the commands

> dirmult.fit  <- dirmult(x)
Iteration 1: Log-likelihood value: -863245.946200229
Iteration 2: Log-likelihood value: -863214.860463357
Iteration 3: Log-likelihood value: -863210.928976511
Iteration 4: Log-likelihood value: -863210.901250554
Iteration 5: Log-likelihood value: -863210.901248778

> dirmult.fit

$loglik [1] -863210.9$ite
[1] 5

$gamma [1] 34.30376 10.44264 1.15606 5.73569 16.28635 1.96183 144.51164$pi
[1] 0.160000389 0.048706790 0.005392124 0.026752540 0.075963185 0.009150412 0.674034560

$theta [1] 0.004642569 What I called the$\hat{\alpha}_j$are given as the$gamma in the output. The quantity$\alpha_0$ can be calculated by summing these up

> alpha <- dirmult.fit$gamma > sum(alpha) [1] 214.398 So the joint distribution of talent levels over the population of MLB players with at least 300 events is approximated by a Dirichlet distribution with parameters:$(\theta_{1B}, \theta_{2B}, \theta_{3B}, \theta_{HR},  \theta_{BB}, \theta_{HBP}, \theta_{OTH}) \sim Dirichlet(34.30, 10.44,  1.16,  5.74, 16.29, 1.96, 144.51)$In 2013, Mike Trout had$x_{1B} = 115$singles,$x_{2B} = 39$doubles,$x_{3B} = 9$triples,$x_{HR} = 27$home runs,$x_{BB} = 100$unintentional walks,$x_{HBP} = 9$times hit by a pitch, and$x_{OTH} = 397$other outcomes in$n = 706$total events for a raw (non-regressed) wOBA of$0.89 \left(\dfrac{115}{706}\right) + 1.27 \left(\dfrac{39}{706}\right) + 1.62 \left(\dfrac{9}{706}\right) + 2.10 \left(\dfrac{27}{706}\right) + 0.69 \left(\dfrac{100}{706}\right) +  0.72 \left(\dfrac{9}{706}\right) \approx 0.423$In order to calculate the regressed weighted on-base average, first calculate the$\alpha_j'$for Mike Trout's posterior distribution of batting ability by$\alpha_{1B}' = 115 + 34.30 = 149.30\alpha_{2B}' = 39 + 10.44 = 49.44\alpha_{3B}' = 9+ 1.16 = 10.16\alpha_{HR}' = 27 + 5.74 = 32.74\alpha_{BB}' = 100 + 16.29 = 116.29\alpha_{HBP}' = 9 + 1.96 = 10.96\alpha_{OTH}' = 407 + 144.51 = 551.51$With$\alpha_0' = 920.40$. The regressed version of Mike Trout's 2013 slugging percentage is then given by a linear transformation of the expected proportion in each outcome:$0.89 \left(\dfrac{149.30}{920.40}\right) + 1.27 \left(\dfrac{49.44}{920.40}\right) + 1.62 \left(\dfrac{10.16}{920.40}\right) + 2.10 \left(\dfrac{32.74}{920.40}\right) + 0.69 \left(\dfrac{116.29}{920.40}\right) + 0.72 \left(\dfrac{10.96}{920.40}\right)\approx 0.401$So based solely on his 2013 stats and the population information, it's estimated that he is a "true" 0.401 wOBA hitter. Of course, we know from many more years of watching him that this is a bit unfair, and his "true" wOBA is closer to 0.423. ## Stabilization As a side note - and I have verified this by simulation, though I have not worked out the details yet mathetmatically - the so-called "stabilization point" (defined as a split-half correlation of$r = 0.5$) for wOBA is given by$\alpha_0$- so if split-half correlation was conducted among the players from 2010-2015 with at least 300 AB + BB - IBB + SF + HBP, there should be a correlation of 0.5 after approximately 214 PA. I'm not sure if this works for just the wOBA weights or any arbitrary set of weights, though I suspect the fact that the weight for the "other" outcome is 0 and all the rest are nonzero has a big role to play in this. ## SLG Shrinkage Another statistic that can be regressed in this same way is slugging percentage (which I'll call SLG for short). Using at-bats (AB) as events, and defining five outcomes - singles (1B), doubles (2B), triples (3B), and home runs (HR), with everything else being lumped into the "other" (OTH) outcome, a player's slugging percentage can be written as a linear transformation of the$\theta_i$for each of these outcomes with weights$w_{1B} = 1$,$w_{2B} = 2$,$w_{3B} = 3$,$W_{HR} = 4$, and$w_{OTH} = 0SLG_i = 1*\theta_{i,1B} + 2*\theta_{i,2B} + 3*\theta_{i,3B} + 4*\theta_{i,HR}$For player$i$, the multinomial distribution of the counts of$x_{i,1B}$singles,$x_{i,2B}$doubles,$x_{i,3B}$triples,$x_{i,HR}$home runs, and$x_{i,OTH}$other outcomes in$n_i$at-bats is$p(\tilde{x}_i | \tilde{\theta}_i, n_i) =\dfrac{n_i!}{x_{i,1B} x_{i,2B}! x_{i,}! x_{i,HR}! x_{i,OTH}! }\theta_{1B,i}^{x_{i,1B}} \theta_{i,2B}^{x_{i,2B}} \theta_{i,3B}^{x_{i,3B}} \theta_{i,HR}^{x_{i,HR}} \theta_{i,OTH}^{x_{i,OTH}}$The Dirichlet distribution of all possible$\tilde{\theta}$values is$p(\tilde{\theta}_i | \tilde{\alpha}) = \displaystyle \dfrac{\prod_{j = 1}^k \Gamma(\alpha_j)}{\Gamma(\sum_{j = 1}^k \alpha_j)} \theta_{i,1B}^{\alpha_{1B} - 1} \theta_{i,2B}^{\alpha_{2B} - 1}\theta_{i,3B}^{\alpha_{3B} - 1} \theta_{i,HR}^{\alpha_{HR} - 1}\theta_{i,OTH}^{\alpha_{OTH} - 1}$Once again, data from fangraphs.com was used, and all MLB non-pitchers from 2010-2015 who had at least 300 AB in a given season were included in the sample. To find maximum likelihood estimates in R the data needs to be loaded into a matrix where row$i$represents the raw counts for each outcome$\tilde{x}_i$for player$i$. The first 10 rows of the matrix (out of 1477) should look something like: > x[1:10,] [,1] [,2] [,3] [,4] [,5] [1,] 91 38 1 42 349 [2,] 122 26 1 44 362 [3,] 111 40 3 32 332 [4,] 63 25 0 30 251 [5,] 67 18 1 21 208 [6,] 86 24 2 43 358 [7,] 58 25 2 20 199 [8,] 68 44 0 14 248 [9,] 102 36 2 37 370 [10,] 119 48 0 30 375 The dirmult package can then be used to find maximum likelihood estimates for the$\alpha_j$of the underlying joint Dirichlet distribution of talent levels > dirmult.fit <- dirmult(x) Iteration 1: Log-likelihood value: -575845.635311559 Iteration 2: Log-likelihood value: -575999.779702559 Iteration 3: Log-likelihood value: -575829.132259007 Iteration 4: Log-likelihood value: -575784.936726078 Iteration 5: Log-likelihood value: -575780.270877135 Iteration 6: Log-likelihood value: -575780.190649985 Iteration 7: Log-likelihood value: -575780.1906191 > dirmult.fit$loglik
[1] -575780.2

$ite [1] 7$gamma
[1]  42.443604  12.855782   1.381905   7.073672 176.120837

$pi [1] 0.176939918 0.053593494 0.005760921 0.029488894 0.734216774$theta
[1] 0.004151517

And $\alpha_0$ is

> alpha <- dirmult.fit$gamma > sum(alpha) [1] 239.8758 So the joint distribution of talent levels over the population of MLB players with at least 300 AB is given by a Dirichlet distribution.$(\theta_{1B}, \theta_{2B}, \theta_{3B}, \theta_{HR}, \theta_{OTH}) \sim Dirichlet(42.44, 12.86, 1.38, 7.07, 176.12)$(This also implies the "stabilization point" for slugging percentage should be at around$\alpha_0 \approx 240$AB - this is different than for wOBA because the definition of "events" are different between the two statistics) In 2013, Mike Trout had$x_{1B} = 115$singles,$X_{2B} = 39$doubles,$X_{3B} = 9$triples,$X_{HR} = 27$home runs, and$X_{OTH} = 399$other outcomes in$n = 589$at-bats.$1 \left(\dfrac{115}{589}\right) + 2 \left(\dfrac{39}{589}\right) + 3 \left(\dfrac{9}{589}\right) + 4 \left(\dfrac{27}{589}\right) + 0 \left(\dfrac{399}{589}\right) \approx 0.557$In order to calculate the regressed slugging percentage, calculate Mike Trout's posterior distribution for batting ability by$\alpha_{1B}' = 115 + 42.44 = 157.44\alpha_{2B}' = 39 + 12.86 = 51.86\alpha_{3B}' = 9+ 1.38 = 10.38\alpha_{HR}' = 27 + 7.07 = 34.07\alpha_{OTH}' = 399 + 176.12 = 575.12$With$\alpha_0' = 828.79$. The regressed version of Mike Trout's 2013 slugging percentage is then given by$1 \left(\dfrac{157.44}{828.79}\right) + 2 \left(\dfrac{51.86}{828.79}\right) + 3 \left(\dfrac{10.38}{828.79}\right) + 4 \left(\dfrac{34.07}{828.79}\right) \approx 0.517$## Model Criticisms From a statistical perspective, this is the most convenient way to perform shrinkage of wOBA, but it doesn't necessarily mean that this is correct - all of this research is dependent on how well the Dirichlet models the joint distribution of talent levels in the league. The fact that the beta works well for the population distributions of each of the talent levels when looked at individually is no guarantee that the multivariate extension should work well for the joint. In order to do a simple test of the fit of the model, data was simulated from the fit model used to perform the wOBA shrinkage (the posterior predictive means and variances are actually available exactly for this model, but it's good practice to simulate). A set of$\tilde{\theta}_i$simulated from the Dirichlet distribution was used to simulate a corresponding set of$\tilde{x}_i$, with the same$n_i$as in the original data set. Comparing the means and standard deviations of the real and simulated data set, the means are \begin{array}{c c c} \textrm{Outcome} & \textrm{Observed Mean} & \textrm{Simulated Mean} \\ \hline \textrm{1B} & 0.1598 & 0.1598 \\ \textrm{2B} & 0.0473 & 0.0480 \\ \textrm{3B} & 0.0049 & 0.0053 \\ \textrm{HR} & 0.0275 & 0.0262 \\ \textrm{BB} & 0.0772 & 0.0756 \\ \textrm{HBP} & 0.0088 & 0.0095 \\ \textrm{OTH} & 0.6745 & 0.6756 \\ \end{array} which look relatively good - the simulated and real means are fairly close. For the standard deviations, real and simulated values are \begin{array}{c c c} \textrm{Outcome} & \textrm{Observed SD} & \textrm{Simulated SD} \\ \hline \textrm{1B} & 0.0296 & 0.0303 \\ \textrm{2B} & 0.0114 & 0.0176 \\ \textrm{3B} & 0.0049 & 0.0061 \\ \textrm{HR} & 0.0153 & 0.0131 \\ \textrm{BB} & 0.0278 & 0.0217 \\ \textrm{HBP} & 0.0072 & 0.0080 \\ \textrm{OTH} & 0.0324 & 0.0381\\ \end{array} which isn't nearly as good. The Multinomial-Dirichlet model is clearly underestimating the amount of variance in double rates and "other" outcome rates while overestimating the variance in home run and walk rates. It's not to an extreme extent - and comparing histograms, the shapes of the real and simulated data sets match - but it does present a source of problems. More ad-hoc methods may give superior results. ## 16 July, 2016 ### 2016 Win Total Predictions (Through All-Star Break) These predictions are based on my own silly estimator, which I know can be improved with some effort on my part. There's some work related to this estimator that I'm trying to get published academically, so I won't talk about the technical details yet (not that they're particularly mind-blowing anyway). These predictions include all games played before the all-star break. I set the nominal coverage at 95% (meaning the way I calculated it the intervals should get it right 95% of the time), but based on tests of earlier seasons at this point in the season the actual coverage is just under 93%, with intervals usually being one game off if and when they are off. Intervals are inclusive. All win totals assume a 162 game schedule. \begin{array} {c c c c} \textrm{Team} & \textrm{Lower} & \textrm{Mean} & \textrm{Upper} & \textrm{True Win Total} & \textrm{Current Wins/Games}\\ \hline ARI & 62 & 72.11 & 82 & 76.75 & 38 / 90 \\ ATL & 52 & 61.82 & 72 & 68.4 & 31 / 89 \\ BAL & 81 & 90.93 & 101 & 86.25 & 51 / 87 \\ BOS & 80 & 90.3 & 100 & 89.19 & 49 / 87 \\ CHC & 87 & 96.9 & 106 & 96.11 & 53 / 88 \\ CHW & 71 & 81.04 & 91 & 80 & 44 / 87 \\ CIN & 51 & 60.62 & 70 & 63.51 & 32 / 89 \\ CLE & 84 & 93.41 & 103 & 90.67 & 52 / 88 \\ COL & 66 & 76.18 & 86 & 79.22 & 40 / 88 \\ DET & 73 & 82.55 & 92 & 81.11 & 46 / 89 \\ HOU & 76 & 85.81 & 96 & 83.9 & 48 / 89 \\ KCR & 70 & 80.3 & 90 & 77.29 & 45 / 88 \\ LAA & 62 & 71.88 & 82 & 77.4 & 37 / 89 \\ LAD & 80 & 89.43 & 99 & 87.7 & 51 / 91 \\ MIA & 75 & 84.5 & 94 & 82.1 & 47 / 88 \\ MIL & 61 & 71.33 & 81 & 71.98 & 38 / 87 \\ MIN & 56 & 65.83 & 76 & 73.06 & 32 / 87 \\ NYM & 75 & 84.9 & 95 & 82.97 & 47 / 88 \\ NYY & 69 & 78.88 & 89 & 76.38 & 44 / 88 \\ OAK & 61 & 70.93 & 81 & 73.08 & 38 / 89 \\ PHI & 64 & 74 & 84 & 72 & 42 / 90 \\ PIT & 73 & 82.71 & 93 & 81.45 & 46 / 89 \\ SDP & 62 & 72.04 & 82 & 75.53 & 38 / 89 \\ SEA & 74 & 83.44 & 93 & 85.31 & 45 / 89 \\ SFG & 87 & 96.8 & 106 & 89.55 & 57 / 90 \\ STL & 77 & 87.13 & 97 & 90.03 & 46 / 88 \\ TBR & 58 & 67.3 & 77 & 72.91 & 34 / 88 \\ TEX & 81 & 91.22 & 101 & 83.75 & 54 / 90 \\ TOR & 80 & 89.42 & 99 & 87.66 & 51 / 91 \\ WSN & 86 & 95.42 & 105 & 93.21 & 54 / 90 \\ \hline\end{array} It's still fairly difficult to predict final win totals even a little over halfway through the season - intervals have a width of approximately 20 games. A few stand-out points - the teams that are predicted to definitely finish below 0.500 are the Atlanta Braves, the Cincinnati Reds, the Minnesota Twins, and the Tampa Bay Rays, with the Reds being the worst of those teams (they are an estimated as a "true" 63.51 win team). On the other side, the teams predicted to definitely finish above 0.500 are the Chicago Cubs, the Cleveland Indians, the San Francisco Giants, and the Washington Nationals, with the Cubs being the best of these teams (they are estimated as a "true" 96.11 win team). The Texas Rangers and San Francisco Giants in particular have been an exceptionally lucky team - they are predicted to win approximately 7 more games than their "true" win total. Likewise, the Atlanta Braves and Minnesota Twins have been unlucky, both predicted to win approximately 7 fewer games than their "true" win total. To explain the difference between "Mean" and "True Win Total" - imagine flipping a fair coin 10 times. The number of heads you expect is 5 - this is what I have called "True Win Total," representing my best guess at the true ability of the team over 162 games. However, if you pause halfway through and note that in the first 5 flips there were 4 heads, the predicted total number of heads becomes$4 + 0.5(5) = 6.5\$ - this is what I have called "Mean", representing the expected number of wins based on true ability over the remaining schedule added to the current number of wins (from the beginning of the season until the all-star break).

These quantiles are based off of a distribution - I've uploaded a picture of each team's distribution to imgur. The bars in red are the win total values covered by the 95% interval. The blue line represents my estimate of the team's "True Win Total" based on its performance - so if the blue line is to the left of the peak, the team is predicted to finish "lucky" - more wins than would be expected based on their talent level - and if the blue line is to the right of the peak, the team is predicted to finish "unlucky" - fewer wins that would be expected based on their talent level.