25 October, 2023

For hitters, cold streaks run colder than hot streaks run hot

This blog post is just the product of a thought exercise: how much information do you get from a certain number of plate appearances? Suppose we observe $n = 25$ plate appearances for a batter. If the batter gets on-base $x = 5$ times, is that the same amount of information as if the batter gets on-base $x = 10$ times? 

The answer is no. As it turns out, the batter obtaining fewer hits is more informative for the batter being "bad" than the batter obtaining more hits is for the batter being "good." How is this possible? Consider the very simple case of forming a standard 95% confidence interval for a binomial proportion. From any statistics textbook, this is just

 

$\hat{p} \pm 1.96 \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$

 

where $\hat{p}$ is the proportion of on-base events for $n$ plate appearances.  Consider the second part, which I will refer to as the "margin of error" and which controls the width of the confidence interval. For $n = 25$ plate appearances, $x = 5$ gives $\hat{p} = 5/25 = 0.2$ and gives


$1.96 \sqrt{\dfrac{0.2(1-0.2)}{25}} = 0.1568$

 

For $n = 10$ on-base events, this gives $\hat{p} = 0.4$ and 


$1.96 \sqrt{\dfrac{0.4(1-0.4)}{25}} = 0.19204$

 

The width of the margin of error of the confidence is nearly 15% higher for $\hat{p} = 0.4$ than for $\hat{p} = 0.2$! There is more uncertainty with a better result.

Going to a Bayesian framework does not fix this issue, with the possible exception of when heavily weighted priors are being used which would be not justifiable in practice. Suppose that the number of on-base events $x$ in $n$ plate appearances once again follows the (overly simplistic) binomial distribution with parameter $p$, and $p$ is assumed to have a $Beta(1,1)$ distribution, which is the simple uniform case.


$x \sim Bin(n, p)$

$p \sim Beta(1,1)$


For the case of $x = 5$ on-base events in $n = 25$ plate appearances, the posterior distribution has form , standard deviation, and 95% central credible interval


$p | x = 5, n = 25 \sim Beta(6, 21)$

$SD(p | x = 5, n = 25) = \sqrt{\dfrac{(6)(21)}{(6 + 21)^2 (6 + 21 + 1)}} = 0.0786$

95% Central CI: $(0.0897,0.3935)$


For the case of $x = 10$ on-base events in $n = 25$ plate appearances, the posterior distribution has form , standard deviation, and 95% central credible interval


$p | x = 10, n = 25 \sim Beta(11, 16)$

$SD(p | x = 10, n = 25) = \sqrt{\dfrac{(11)(16)}{(11 + 16)^2 (11 + 16 + 1)}} = 0.929$

95% Central CI:$(0.2335,0.5942)$ 


Once again, the scenario with worse performance ($x = 5$ on-base events) has a smaller standard deviation, implying there is less posterior uncertainty about the outcome. In addition, the width of the 95% central credible interval is smaller for $x = 5$ ($0.3038$) than for $x = 10$ (0.3608)$.

So how much information is in the $n$ trials? One way to define information, in a probabilistic sense, is with a concept called the observed information. Observed information is a statistic which measures a concept called the Fisher information. Fisher information measures the amount of information that a sample carries about an unknown parameter $\theta$. Unfortunately, calculating this requires knowing the parameter in question, and so it is usually estimated. The log-likelihood of a set of observations $\tilde{x} = \{x_1, x_2, \dots, x_n\}$ is defined as


$\displaystyle \ell (\theta | \tilde{x}) = \sum_{i = 1}^n \log[ f(x _i | \theta)]$

 

And the observed information is defined as the negative second derivative of the log-likelihood, taken with respect to $\theta$.

 

$I(\tilde{x}) = -\dfrac{d}{d \theta^2}  \ell (\theta | \tilde{x})$

 

Note that Bayesians may replace the log-likelihood with the log-posterior distribution of $\theta$.

 In general the observed information must be calculated for every model, but is known for certain models. For the binomial distribution, the observed information is

 

$I(\hat{p}) = \dfrac{n}{\hat{p}(1-\hat{p})}$


where $\hat{p} = x/n$. Hence, for the case of $x = 5$ on-base events in $n = 25$ plate appearances, $I(0.2) = 156.25$. For the case of $x = 10$ on-base events in $n = 25$ plate appearances, $I(0.4) = 104.1667$. There is quite literally more information in the case where the batter performed worse.

For pitchers, the opposite is the case. If we assume that the number of events in a fixed number of trials (such as runs allowed per 9 innings or walks plus hits per inning pitched), this most appropriate simple distribution is the Poisson distribution with parameter $\lambda$. For $n$ trials, the observed information is


$I(\hat{\lambda}) = \dfrac{n}{hat{\lambda}}$


where $\hat{\lambda} = \bar{x}$, the sample mean of observations. 

Imagine two pitchers: one has allowed $x = 30$ walks plus hits in $n = 25$ innings pitched, while the other has allowed $x = 20$ walks plus hits in $n = 25$ innings pitched. For which pitcher do we have more information about their abilities?

For the first pitcher, their sample WHIP is $\hat{\lambda} = 30/25 = 1.2$ and their observed information is $I(1.2) = 25/1.2 = 20.8333$. For the second pitcher, their sample WHIP is $\hat{\lambda} = 20/30 = 0.8$ and their observed information is $I(0.8) = 25/0.8 = 31.25$. Hence, we have more information about the pitcher who has performed better.

So the situation is reversed for batters and hitters. For batters, we tend to have more information when they perform poorly. For pitchers, we tend to have more information when the perform well. This suggests certain managerial strategies in small samples: it is justifiable to pull a poorly performing batter, but also perhaps justifiable to allow a poorly performing pitcher to have more innings. We just have more information about the bad hitter than we do about the bad pitcher, thanks to information theory.

06 April, 2023

2022 and 2023 Stabilization Points

Hello everyone! It's been another couple of years, and I'm ready to update stabilization points again. These are my estimated stabilization points for the 2022 and 2023 MLB seasons, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles  Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model. As usual, all data and code I used for this post can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.

I've included standard error estimates for 2022 and 2023, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here. The 2018 calculations can be found here. The 2019 calculations can be found here. I didn't do calculations in 2020 because of the pandemic in general. The 2021 calculations can be found here.


The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2016-2021 for the 2022 stabilization points and 2017 - 2022 for the 2023 stabilization points) were picked fairly arbitrarily. This is consistent with my previous work, though I do have concerns about including rates from, for example, the covid year and juiced ball era. However, including fewer years means less accurate estimates. A tradeoff must be made, and I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

Offensive Statistics

2022 Statistics


\begin{array}{| l | l | c | c | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&2022\textrm{ }\hat{M}&2022\textrm{ }SE(\hat{M})&2022\textrm{ }\hat{\mu} & \textrm{Cutoff} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 316.48 & 19.72 & 0.331 & 300  \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 459.44 &  50.10 & 0.304 & 300 \\
\textrm{BA}&\textrm{H/AB} & 473.86 & 38.69 & 0.263 & 300\\
\textrm{SO Rate}&\textrm{SO/PA} & 51.89 & 2.20  & 0.210 & 300  \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 105.36 & 4.96 & 0.083 & 300 \\
\textrm{1B Rate}&\textrm{1B/PA} & 195.22 & 10.55 & 0.147 & 300  \\
\textrm{2B Rate}&\textrm{2B/PA} & 1197.83& 153.68 & 0.047 & 300 \\
\textrm{3B Rate}&\textrm{3B/PA} & 561.01 & 51.43 & 0.004 & 300  \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1002.35 & 114.03 & 0.051 & 300 \\
\textrm{HR Rate} & \textrm{HR/PA} & 155.39 & 8.29 & 0.035 & 300 \\
\textrm{HBP Rate} & \textrm{HBP/PA} & 248.10 & 15.53 & 0.011 & 300  \\ \hline
\end{array}

2023 Statistics


\begin{array}{| l | l | c | c | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&2023\textrm{ }\hat{M}&2023\textrm{ }SE(\hat{M})&2023\textrm{ }\hat{\mu} & \textrm{Cutoff} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 301.11 & 18.46 & 0.329 & 300  \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 426.80 &  45.29 & 0.302 & 300 \\
\textrm{BA}&\textrm{H/AB} & 434.51 & 34.08 & 0.259 & 300\\
\textrm{SO Rate}&\textrm{SO/PA} & 53.16 & 2.25  & 0.213 & 300  \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 107.36 & 5.06 & 0.083 & 300 \\
\textrm{1B Rate}&\textrm{1B/PA} & 196.97 & 10.65 & 0.145 & 300  \\
\textrm{2B Rate}&\textrm{2B/PA} & 1189.22 & 151.90 & 0.047 & 300 \\
\textrm{3B Rate}&\textrm{3B/PA} & 634.14 & 62.08 & 0.005 & 300  \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1035.82 & 120.93 & 0.051 & 300 \\
\textrm{HR Rate} & \textrm{HR/PA} & 156.77 & 8.34 & 0.035 & 300 \\
\textrm{HBP Rate} & \textrm{HBP/PA} & 256.24 & 16.04 & 0.011 & 300  \\ \hline
\end{array}


In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

This is a good opportunity to compare the stabilization points I calculated for the 2016 season to the stabilization points for the 2023 season, as the 2023 season includes data from 2017-2022, so there is no crossover of information between them.


\begin{array}{| l | l | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&2023\textrm{ }\hat{M}&2016\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 301.11 & 301.32 \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 426.80 &  433.04 \\
\textrm{BA}&\textrm{H/AB} & 434.51 & 491.20\\
\textrm{SO Rate}&\textrm{SO/PA} & 53.16 & 49.23  \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 107.36 & 112.44 \\
\textrm{1B Rate}&\textrm{1B/PA} & 196.97 & 223.86 \\
\textrm{2B Rate}&\textrm{2B/PA} & 1189.22 & 1169.75 \\
\textrm{3B Rate}&\textrm{3B/PA} & 634.14 & 365.06  \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1035.82 & 1075.41 \\
\textrm{HR Rate} & \textrm{HR/PA} & 156.77 & 126.35  \\
\textrm{HBP Rate} & \textrm{HBP/PA} & 256.24 & 300.97\\ \hline
\end{array}

What is most apparent is the stability of most statistics. The stabilization point for OBP, BABIP, SO Rate, BB rate, 2B rate, and XBH rate are nearly identical, indicating that the spread of abilities within this distribution is roughly the same now as it is in 2016. Stabilization points for BA, 1B rate, HR Rate, and HBP rate are fairly close, indicating not much change. The big outlier is 3B rate, or the rate of triples. Though the estimated probability of a triple per PA is approximately 0.005 in both seasons, the stabilization rate has nearly doubled from 2016 to 2023. This is indicative that the spread in the ability to triples has increased - though the league average rate of triples has remained the same, there are fewer batters that have a "true" triples-hitting ability which is much higher or lower than the league average.


Pitching Statistics 

2022 Statistics


\begin{array}{| l | l | c | c | c | c  | c |} \hline
\textrm{Stat}&\textrm{Formula}&2022\textrm{ }\hat{M}&2022\textrm{ }SE(\hat{M})&2022\textrm{ }\hat{\mu} & \textrm{Cutoff} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 929.31 & 165.62 & 0.284 &300 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 66.56 & 4.38 & 0.439 &3001\\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 61.79 & 4.03 & 0.351 &300 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1692.45 & 467.98 & 0.210 &300  \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 715.15 & 226.83 & 0.135 & 100  \\
\textrm{SO Rate}&\textrm{SO/TBF}& 80.13 & 4.00 & 0.220 &400 \\
\textrm{HR Rate}&\textrm{HR/TBF}& 1102.12 & 175.15 & 0.032 &400 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 256.28 & 20.37 & 0.074 & 400 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 931.55 & 131.51 & 0.009 &400 \\
\textrm{Hit rate}&\textrm{H/TBF}& 414.03 & 33.46 & 0.230 &400  \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 395.83 & 35.75 & 0.312 &400\\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 58.49 & 4.40 & 1.28 &80 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 54.50  & 4.07 & 0.465 &80 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 61.96 & 4.75 & 1.23 &80\\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

2023 Statistics


\begin{array}{| l | l | c | c | c | c  | c |} \hline
\textrm{Stat}&\textrm{Formula}&2023\textrm{ }\hat{M}&2023\textrm{ }SE(\hat{M})&2023\textrm{ }\hat{\mu} & \textrm{Cutoff} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 809.54 & 134.29 & 0.282 &300 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 66.24 & 4.40 & 0.434 &3001\\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 59.40 & 3.90 & 0.357 &300 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1596.98 & 429.07 & 0.209 &300  \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 386.34 & 77.01 & 0.133 & 100  \\
\textrm{SO Rate}&\textrm{SO/TBF}& 77.46 & 4.85 & 0.223 &400 \\
\textrm{HR Rate}&\textrm{HR/TBF}& 942.58 & 134.48 & 0.032 &400 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 258.78 & 20.84 & 0.073 & 400 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 766.40 & 98.75 & 0.009 &400 \\
\textrm{Hit rate}&\textrm{H/TBF}& 391.55 & 30.96 & 0.227 &400  \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 358.50 & 31.64 & 0.309 &400\\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 54.96 & 4.07& 1.27 &80 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 50.33 & 3.68 & 0.459 &80 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 58.00 & 4.37 & 1.22 &80\\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

Once again, this is a good opportunity to compare the stabilization rates for 2016 to the stabilization rates for 2023.

\begin{array}{| l | l | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&2023\textrm{ }\hat{M}&2016\textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 809.54 & 1408.72  \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 66.24 & 65.52 \\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 59.40 & 61.96\\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1596.98 & 768.42  \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 386.34 & 505.11 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 77.46 & 90.94  \\
\textrm{HR Rate}&\textrm{HR/TBF}& 942.58 & 931.59  \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 258.78 & 221.25 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 766.40 & 989.30\\
\textrm{Hit rate}&\textrm{H/TBF}& 391.55 & 623.35\\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 358.50 & 524.73\\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 54.96 & 77.2\\
\textrm{ER Rate}&\textrm{ER/IP*}& 50.33 & 59.55\\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 58.00 & 75.79\\ \hline
\end{array}

Comparing 2023 to 2016, the outliers are obvious: the stabilization point for pitcher BABIP has nearly halved since then, while the stabilization point for line drive rate has nearly doubled (and similarly for hit rate). Given that the estimated mean pitcher BABIP and line drive rate are similar for the two years (0.284/0.210 for 2023 and 0.289/0.203 for 2016), this indicates a change in the spread of abilities. Simply put, there is a much lower spread of pitcher BABIP "true" abilities, and with it, a much higher spread of line drive rates. Simply put, teams appear to be willing to trade more or less line drives for less variance in the batting average when the ball is in play.

 

Usage

 

Aside from the obvious use of knowing approximately when results are half due to luck and half  skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation.

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.329*301.11}{300 + 301.11} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.11 + 300}} = (0.317,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.329*301.11}{300 + 301.11} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.11+ 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424)$ 

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% "unshrunk" intervals, but when I tested them in my article "From Stabilization to Interval Estimation," they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h$\$$BB + h$\$$HBP, h$\$$PA, cutoff = 300, 1)  
$\$$Parameters
[1]   0.3287902 301.1076958

$\$$Standard.Error
[1] 18.45775

$\$$Model
[1] "Beta-Binomial"






The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

As always, feel free to post any comments or suggestions.



02 April, 2021

2021 Stabilization Points

These are my estimated stabilization points for the 2021 MLB season, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles  Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model.

However, good news! In the past two years, I've had some research on reliability for non-normal data corrected, expanded upon, and published in academic journals. I can definitively say that my maximum likelihood estimator is accurately estimating the reliability of these statistics exactly the same as Cronbach's alpha or KR-20 and performs as well or better than Cronbach's alpha, assuming the model is correct, which - while no model is correct - I believe is very accurate. The article can be found here (for the preprint, click here). I also published a paper with some KR-20 and KR-21 reliability estimators specifically for exponential family distributions such as binomial, Poisson, etc. The article can be found here (for the preprint, click here). These estimators are a little more efficient for small sample sizes but for large sample sizes such as in this case, however, the estimators should be nearly identical.

As usual, all data and code I used for this post can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.

I've included standard error estimates for 2021, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here. The 2018 calculations can be found here. The 2019 calculations can be found here. I didn't do calculations in 2020 because of the pandemic in general.


The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2015-20120), were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

Offensive Statistics


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2019\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 302.57 & 18.39 & 0.331 & 300 & 295.20 \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 451.24 &  47.22 & 0.306 & 300 & 431.49 \\
\textrm{BA}&\textrm{H/AB} & 511.71 & 42.78 & 0.265 & 300 & 488.49 \\
\textrm{SO Rate}&\textrm{SO/PA} & 50.37 & 2.12  & 0.205 & 300 & 49.05 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 100.47 & 4.67 & 0.080 & 300 & 104.08 \\
\textrm{1B Rate}&\textrm{1B/PA} & 191.17 & 10.20 & 0.150 & 300 & 197.43 \\
\textrm{2B Rate}&\textrm{2B/PA} & 1242.67 & 162.27 & 0.047 & 300 & 1200.46 \\
\textrm{3B Rate}&\textrm{3B/PA} & 481.11 & 28.74 & 0.005 & 300 & 421.91 \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1059.31 & 124.09 & 0.052 & 300 & 1070.09 \\
\textrm{HR Rate} & \textrm{HR/PA} & 146.00 & 7.68 & 0.034 & 300 & 141.80\\
\textrm{HBP Rate} & \textrm{HBP/PA} & 261.13 & 16.56 & 0.010 & 300 & 266.92 \\ \hline
\end{array}

In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

The stabilization point of the 3B rate increased dramatically by approximately two standard deviations, indicating that the talent level of hitting triples has clustered more closely around its mean. In general, however, most stabilization points are roughly the same as the previous year, taking into account that year-to-year and sample-to-sample variation in estimates is expected even if the true stabilization points are not changing.

Pitching Statistics 


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}& 2019 \textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1061.43 & 197.34 & 0.286 &300& 1184.38 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 66.20 & 4.25 & 0.443 &300& 64.51\\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 62.33 & 3.97 & 0.346 &300& 60.68 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1773.66 & 486.12 & 0.211 &300& 2197.02  \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 529.40 & 129.10 & 0.130 & 100 & 351.53 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 80.78 & 4.97 & 0.214 &400& 90.86 \\
\textrm{HR Rate}&\textrm{HR/TBF}& 959.57 & 133.073 & 0.031 &400& 764.48\\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 251.22 & 19.47 & 0.072 & 400 & 230.09 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 1035.90 & 153.68 & 0.009 &400& 906.25 \\
\textrm{Hit rate}&\textrm{H/TBF}& 453.30 & 37.52 & 0.232 &400& 496.56   \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 407.36 & 36.33 & 0.313 &400& 443.60 \\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 63.38 & 4.79 & 1.29 &80& 67.84 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 57.73 & 4.30 & 0.460 &80& 57.97 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 64.70 & 4.92 & 1.23 &80& 67.23 \\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

Most are the same, but the HR/FB stabilization point has shifted up dramatically given its standard error, indicating a likely change in true talent level and not just sample-to-sample and year-to-year variation. This indicates that the distribution of HR/FB talent levels is clustering around its mean, possibly indicating a change in approach by pitchers or batters over the past two years. The mean has also shifted up over the previous calculation. Similarly, the HR rate stabilization point and mean have increased. Conversely, the strikeout rate stabilization rate has decreased, indicating less clustering of talent levels around the mean, and the mean has also increased.

Usage

 

Aside from the obvious use of knowing approximately when results are half due to luck and half  skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation.

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.331*302.57}{300 + 302.57} \pm 1.96 \sqrt{\dfrac{0.331(1-0.331)}{302.57 + 300}} = (0.318,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.331*302.57}{300 + 302.57} \pm 1.96 \sqrt{\dfrac{0.331(1-0.331)}{302.57+ 300} + \dfrac{0.331(1-0.331)}{250}} = (0.286,0.425)$ 

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% "unshrunk" intervals, but when I tested them in my article "From Stabilization to Interval Estimation," they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h$\$$BB + h$\$$HBP, h$\$$PA, cutoff = 300, 1)  
$\$$Parameters
[1]   0.3306363 302.5670532

$\$$Standard.Error
[1] 18.38593

$\$$Model
[1] "Beta-Binomial"





The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

As always, feel free to post any comments or suggestions.


21 April, 2019

2019 Stabilization Points

These are my estimated stabilization points for the 2019 MLB season, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model.

(As usual, all data and code I used can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.) 

I've included standard error estimates for 2019, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here. The 2018 calculations can be found here.

The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2013-2018), were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

Offensive Statistics


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2018\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 295.20 & 16.26 & 0.329 & 300 & 302.27 \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 431.49 &  39.76 & 0.306 & 300 & 429.47 \\
\textrm{BA}&\textrm{H/AB} & 488.49 & 36.52 & 0.264 & 300 & 463.19 \\
\textrm{SO Rate}&\textrm{SO/PA} & 49.05 & 1.88  & 0.198 & 300 & 48.74 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 104.08 & 4.45 & 0.078 & 300 & 108.84 \\
\textrm{1B Rate}&\textrm{1B/PA} & 197.43 & 9.72 & 0.154 & 300 & 200.94 \\
\textrm{2B Rate}&\textrm{2B/PA} & 1200.46 & 140.37 & 0.047 & 300 & 1164.82 \\
\textrm{3B Rate}&\textrm{3B/PA} & 421.91 & 31.67 & 0.005 & 300 & 390.75 \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1070.09 & 115.96 & 0.052 & 300 & 1064.01 \\
\textrm{HR Rate} & \textrm{HR/PA} & 141.80 & 6.78 & 0.030 & 300 & 132.52 \\
\textrm{HBP Rate} & \textrm{HBP/PA} & 266.92 & 15.74 & 0.009 & 300 & 280.00 \\ \hline
\end{array}

In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

Noticeably, the stabilization point for the HR rate has increased over the past four years, indicating less variance in talent level of hitting home runs. Meanwhile, the stabilization point for HBP rate has decreased over the past four years, suggesting increased variance in """talent""" level of getting hit by pitches.

Pitching Statistics 


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2018 \textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1184.38 & 206.63& 0.288 &300&1322.70 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 64.51 & 3.66 & 0.446 &300&63.12 \\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}&60.68 &3.41 & 0.344 &300&59.80 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 2197.02  & 622.02 & 0.210 &300&2157.15 \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 351.53 & 56.05 & 0.117 & 100 & 388.61 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 90.86 &5.07& 0.204&400&93.52 \\
\textrm{HR Rate}&\textrm{HR/TBF}&764.48& 82.78 & 0.028 &400&790.97 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}&  230.09 & 15.46 & 0.071 &400&238.70 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 906.25 & 109.63 & 0.009 &400&935.61 \\
\textrm{Hit rate}&\textrm{H/TBF}&496.56 & 39.48 & 0.233 &400&536.32  \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 443.60 & 36.42 & 0.312 &400& 472.09 \\
\textrm{WHIP}&\textrm{(H + BB)/IP*}&67.84 & 4.69 & 1.28 &80& 71.10 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 57.97 & 3.87 & 0.444 &80& 58.59 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 67.23 & 4.64 & 1.22 &80& 69.11 \\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

Most statistics this year shifted not just in stabilization point, but also in mean, possibly indicating a shift in the pitching environment. The stabilization points which did shift tended to shift down, indicating an increased spread of variation around the mean talent levels.

Usage

 


Aside from the obvious use of knowing approximately when results are half due to luck and half  skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation.

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.329*295.20}{300 + 295.20} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{295.20 + 300}} = (0.317,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.329*295.20}{300 + 295.20} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{295.20 + 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424)$ 

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% "unshrunk" intervals, but when I tested them in my article "From Stabilization to Interval Estimation," they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h$\$$BB + h$\$$HBP, h$\$$PA, cutoff = 300, 1)  
$\$$Parameters
[1]   0.3285272 295.1970047

$\$$Standard.Error
[1] 16.25874

$\$$Model
[1] "Beta-Binomial"



The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

As always, feel free to post any comments or suggestions.

05 September, 2018

2018 Stabilization Points

So this post is waaaaay late in the 2018 season. I've been busy! But, I'm doing this again since it's pretty easy to do. But I am copying and pasting the text from the posts from the last two years, because I can.

These are my estimated stabilization points for the 2018 MLB season, once again using the maximum likelihood method on the totals that I used for previous years. This method is explained in my articles Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model.

(As usual, all data and code I used can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.) 

I've included standard error estimates for 2018, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, as those values are estimates themselves with their own standard errors, and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here. The 2017 calculations can be found here.

The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data (2012-2017), were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

Offensive Statistics


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2017\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} &  302.27 & 16.88  & 0.329 & 300 & 303.77\\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} & 429.47 &  39.30 & 0.306 & 300 & 442.62  \\
\textrm{BA}&\textrm{H/AB} & 463.19 & 33.94 & 0.266 & 300 & 466.09 \\
\textrm{SO Rate}&\textrm{SO/PA} & 48.74 & 1.88  & 0.194 & 300 & 49.02\\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 108.84 & 4.72 & 0.077 & 300 & 113.64 \\
\textrm{1B Rate}&\textrm{1B/PA} & 200.94 & 9.99 & 0.156 & 300 & 215.29\\
\textrm{2B Rate}&\textrm{2B/PA} & 1164.82 & 134.26 & 0.047 & 300 & 1230.96 \\
\textrm{3B Rate}&\textrm{3B/PA} & 390.75 & 28.72 & 0.005 & 300 & 358.92\\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1064.01 & 115.55 & 0.052 & 300 & 1063.76 \\
\textrm{HR Rate} & \textrm{HR/PA} & 132.52 & 6.31 & 0.030 & 300 & 129.02 \\
\textrm{HBP Rate} & \textrm{HBP/PA} & 280.00 & 16.89 & 0.009 & 300 & 299.39 \\ \hline
\end{array}

In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk  harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

Pitching Statistics 


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2016 \textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1322.70 & 244.54 & 0.289 &300&1356.06 \\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 63.12 & 3.55 & 0.450 &300& 63.12 \\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 59.86 &3.34 & 0.341 &300&59.80 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}&  2157.15 & 586.96 & 0.209 &300& 1497.65 \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 388.61 & 65.28 & 0.115  &100&464.60 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 93.52 &5.25& 0.199&400&94.62\\
\textrm{HR Rate}&\textrm{HR/TBF}&790.97 & 86.34 & 0.029 &400&942.62 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}&238.70 & 16.10 & 0.070 &400&237.53 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 935.61 & 115.06 & 0.008 &400&954.09 \\
\textrm{Hit rate}&\textrm{H/TBF}& 536.32  & 43.99 & 0.235 &400&550.69 \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}&472.09 & 39.51 & 0.313 &400& 496.39 \\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 71.10 & 4.96 & 1.29 &80& 74.68 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 58.59 & 3.91 & 0.447 &80& 62.82 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 69.11 & 4.79 & 1.22 &80& 73.11\\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

Most statistics are roughly the same; however, the line drive stabilization point has increased quite a bit, having doubled in 2016 from 2015. This is not a mistake - it corresponds to a decrease in the variance of line drive rates. Noticeably, the HR rate variance increased, and so the HR rate stabilization point decreased. This indicates a shift in the MLB pitching environment in these particular areas, and points to a weakness in the method - if the underlying league distribution of talent level of a statistic is changing rapidly, this method will fail to account for the change and may be inaccurate.

 

Usage

 


Aside from the obvious use of knowing approximately when results are half due to luck and half from skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation. I believe the confidence intervals from my method should be similar to the intervals from Sean Dolinar's great fangraphs article A New Way to Look at Sample Size, though I have not personally tested this, and am not familiar with the Cronbach's alpha methodology he uses (or with reliability analysis in general).

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300}} = (0.317,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424)$ 

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% unshrunk intervals, but when I tested them in my article "From Stabilization to Interval Estimation", they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h$\$$BB + h$\$$HBP, h$\$$PA, cutoff = 300, 1)  
$\$$Parameters
[1]   0.329098 301.317682

$\$$Standard.Error
[1] 16.92138

$\$$Model
[1] "Beta-Binomial"



The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

Maybe somebody else out there might find this useful. As always, feel free to post any comments or suggestions!

24 April, 2017

2017 Stabilization Points

Once again, I recalculated stabilization points for 2017 MLB data, once again using the maximum likelihood method on the totals that I used for 2015 and 2016. This method is explained in my articles Estimating Theoretical Stabilization Points and WHIP Stabilization by the Gamma-Poisson Model.

(As usual, all data and code I used can be found on my github. I make no claims about the stability, efficiency, or optimality of my code.) 

I've included standard error estimates for 2017, but these should not be used to perform any kinds of tests or intervals to compare to the values from previous years, are those values are estimates themselves with their own standard errors and approximately 5/6 of the data is common between the two estimates. The calculations I performed for 2015 can be found here for batting statistics and here for pitching statistics. The calculations for 2016 can be found here.

The cutoff values I picked were the minimum number of events (PA, AB, TBF, BIP, etc. - the denominators in the formulas) in order to be considered for a year. These cutoff values, and the choice of 6 years worth of data, were picked fairly arbitrarily - I tried to go with what was reasonable (based on seeing what others were doing and my own knowledge of baseball) and what seemed to work well in practice.

Offensive Statistics


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2016\textrm{ }\hat{M} \\ \hline
\textrm{OBP}&\textrm{(H + BB + HBP)/PA} & 303.77 & 17.08  & 0.328 & 300 & 301.32 \\
\textrm{BABIP}&\textrm{(H - HR)/(AB-SO-HR+SF)} &  442.62  &  40.55 & 0.306 & 300 & 433.04\\
\textrm{BA}&\textrm{H/AB} & 466.09 & 34.30  & 0.266 & 300 & 491.20\\
\textrm{SO Rate}&\textrm{SO/PA} & 49.02 & 1.90  & 0.188 & 300 & 49.23\\
\textrm{BB Rate}&\textrm{(BB-IBB)/(PA-IBB)} & 113.64 & 5.00 & 0.077 & 300 & 112.44 \\
\textrm{1B Rate}&\textrm{1B/PA} & 215.29 & 10.95 & 0.157 & 300 & 223.86 \\
\textrm{2B Rate}&\textrm{2B/PA} & 1230.96 & 148.48 & 0.047 & 300 & 1169.75 \\
\textrm{3B Rate}&\textrm{3B/PA} & 358.92 & 25.71 & 0.005 & 300 & 365.06 \\
\textrm{XBH Rate} & \textrm{(2B + 3B)/PA} & 1063.76 & 116.54 & 0.052 & 300 & 1075.41 \\
\textrm{HR Rate} & \textrm{HR/PA} & 129.02 & 6.18 & 0.028 & 300 & 126.35\\
\textrm{HBP Rate} & \textrm{HBP/PA} & 299.39 & 18.60 & 0.009 & 300 & 300.97 \\ \hline
\end{array}

In general, a larger stabilization point will be due to a decreased spread of talent levels - as talent levels get closer together, more extreme stats become less and less likely, and will be shrunk  harder towards the mean. Consequently, it takes more observations to know that a player's high or low stats (relative to the rest of the league) are real and not just a fluke of randomness. Similarly, smaller stabilization points will point towards an increase in the spread of talent levels.

Pitching Statistics 


\begin{array}{| l | l | c | c | c | c  | c | c |} \hline
\textrm{Stat}&\textrm{Formula}&\hat{M}&SE(\hat{M})&\hat{\mu} & \textrm{Cutoff}&2016 \textrm{ }\hat{M} \\ \hline
\textrm{BABIP}&\textrm{(H-HR)/(GB + FB + LD)}& 1356.06 & 247.48 & 0.289 &300&1408.72\\
\textrm{GB Rate}&\textrm{GB/(GB + FB + LD)}& 64.00 & 3.56 & 0.450 &300& 63.53 \\
\textrm{FB Rate}&\textrm{FB/(GB + FB + LD)}& 61.73 &3.42& 0.342 &300&59.80 \\
\textrm{LD Rate}&\textrm{LD/(GB + FB + LD)}& 1497.65 & 296.21 & 0.208 &300&731.02 \\
\textrm{HR/FB Rate}&\textrm{HR/FB}& 464.60 & 85.51 & 0.108 &100&488.53 \\
\textrm{SO Rate}&\textrm{SO/TBF}& 94.62&5.29& 0.194&400&93.15 \\
\textrm{HR Rate}&\textrm{HR/TBF}& 942.62 & 110.66 & 0.026 &400&949.02 \\
\textrm{BB Rate}&\textrm{(BB-IBB)/(TBF-IBB)}& 237.53 & 15.84 & 0.069 &400&236.87 \\
\textrm{HBP Rate}&\textrm{HBP/TBF}& 954.09 & 115.60 & 0.008 &400&939.00 \\
\textrm{Hit rate}&\textrm{H/TBF}& 550.69 & 45.63 & 0.235 &400&559.18 \\
\textrm{OBP}&\textrm{(H + BB + HBP)/TBF}& 496.39 & 41.81 & 0.312 &400&526.77 \\
\textrm{WHIP}&\textrm{(H + BB)/IP*}& 74.68 & 5.25 & 1.29 &80&78.97 \\
\textrm{ER Rate}&\textrm{ER/IP*}& 62.82 & 4.24 & 0.440 &80&63.08 \\
\textrm{Extra BF}&\textrm{(TBF - 3IP*)/IP*}& 73.11& 5.11 & 1.22 &80&75.79 \\ \hline
\end{array}

* When dividing by IP, I corrected the 0.1 and 0.2 representations to 0.33 and 0.67, respectively. 

Most statistics are roughly the same; however, the line drive stabilization point has roughly doubled. I checked my calculations for both years and this is not a mistake. It corresponds to a decrease in the variance of line drive rates. It should also be noted that the average line drive rate increased from 0.203 to 0.208 - there are perhaps remnants of an odd 2010 that is no longer included in the data set.

 

Usage

 

Note: This section is largely unchanged from the previous year's version. The formulas given here work for "counting" offensive stats (OBP, BA, etc.).

Aside from the obvious use of knowing approximately when results are half due to luck and half from skill, these stabilization points (along with league means) can be used to provide very basic confidence intervals and prediction intervals for estimates that have been shrunk towards the population mean, as demonstrated in my article From Stabilization to Interval Estimation. I believe the confidence intervals from my method should be similar to the intervals from Sean Dolinar's great fangraphs article A New Way to Look at Sample Size, though I have not personally tested this, and am not familiar with the Cronbach's alpha methodology he uses (or with reliability analysis in general).

For example, suppose that in the first half, a player has an on-base percentage of 0.380 in 300 plate appearances, corresponding to 114 on-base events. A 95% confidence interval using my empirical Bayesian techniques (based on a normal-normal model) is

$\dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300}} = (0.317,0.392)$

That is, we believe the player's true on-base percentage to be between 0.317 and 0.392 with 95% confidence. I used a normal distribution for talent levels with a normal approximation to the binomial for the distribution of observed OBP, but that is not the only possible choice - it just resulted in the simplest formulas for the intervals.

Suppose that the player will get an additional $\tilde{n} = 250$ PA in the second half of the season. A 95% prediction interval for his OBP over those PA is given by

$\dfrac{114 + 0.329*301.32}{300 + 301.32} \pm 1.96 \sqrt{\dfrac{0.329(1-0.329)}{301.32 + 300} + \dfrac{0.329(1-0.329)}{250}} = (0.285,0.424)$ 

That is, 95% of the time the player's OBP over the 250 PA in the second half of the season should be between 0.285 and 0.424. These intervals are overly optimistic and "dumb" in that they take only the league mean and variance and the player's own statistics into account, representing an advantage only over 95% unshrunk intervals, but when I tested them in my article "From Stabilization to Interval Estimation", they worked well for prediction.

As usual, all my data and code can be found on my github. I wrote a general function in $R$ to calculate the stabilization point for any basic counting stat, or unweighted sums of counting stats like OBP (I am still working on weighted sums so I can apply this to things like wOBA ). The function returns the estimated league mean of the statistic and estimated stabilization point, a standard error for the stabilization point, and what model was used (I only have two programmed in - 1 for the beta-binomial and 2 for the gamma-Poisson). It also gives a plot of the estimated stabilization at different numbers of events, with 95% confidence bounds.

> stabilize(h$\$$H + h$\$$BB + h$\$$HBP, h$\$$PA, cutoff = 300, 1)  
$\$$Parameters
[1]   0.329098 301.317682

$\$$Standard.Error
[1] 16.92138

$\$$Model
[1] "Beta-Binomial"



The confidence bounds are created from the estimates $\hat{M}$ and $SE(\hat{M})$ above and the formula

$\left(\dfrac{n}{n+\hat{M}}\right) \pm 1.96 \left[\dfrac{n}{(n+\hat{M})^2}\right] SE(\hat{M})$

which is obtained from the applying the delta method to the function $p(\hat{M}) = n/(n + \hat{M})$. Note that the mean and prediction intervals I gave do not take $SE(\hat{M})$ into account (ignoring the uncertainty surrounding the correct shrinkage amount, which is indicated by the confidence bounds above), but this is not a huge problem - if you don't believe me, plug slightly different values of $M$ into the formulas yourself and see that the resulting intervals do not change much.

Maybe somebody else out there might find this useful. As always, feel free to post any comments or suggestions!

03 September, 2016

2016 Win Total Predictions (Through August 31)


These predictions are based on my own silly estimator, which I know can be improved with some effort on my part.  There's some work related to this estimator that I'm trying to get published academically, so I won't talk about the technical details yet (not that they're particularly mind-blowing anyway). These predictions include all games played before through August 31 break.

As a side note, I noticed that my projections are very similar to the Fangraphs projections on the same day. I'm sure we're both calculating the projections from completely different methods, but it's reassuring that others have arrived at basically the same conclusions. Theirs have also have playoff projections, though mine have intervals attached to them.

I set the nominal coverage at 95% (meaning the way I calculated it the intervals should get it right 95% of the time), but based on tests of earlier seasons at this point in the season the actual coverage is around 94%, with intervals usually being one game off if and when they are off.

Intervals are inclusive. All win totals assume a 162 game schedule.

\begin{array} {c c c c} 
\textrm{Team}  & \textrm{Lower}  & \textrm{Mean} & \textrm{Upper} & \textrm{True Win Total}  & \textrm{Current Wins/Games}\\ \hline
ARI & 63 & 68.82 & 74 & 71.61 & 56 / 133 \\
ATL & 57 & 62.25 & 68 & 68.41 & 50 / 133 \\
BAL & 81 & 86.57 & 92 & 81.42 & 72 / 133 \\
BOS & 85 & 90.41 & 96 & 91.7 & 74 / 133 \\
CHC & 98 & 103.63 & 109 & 100.59 & 85 / 132 \\
CHW & 71 & 76.85 & 83 & 77.61 & 62 / 131 \\
CIN & 62 & 67.9 & 74 & 69.67 & 55 / 132 \\
CLE & 87 & 92.68 & 98 & 90.03 & 76 / 132 \\
COL & 73 & 78.63 & 84 & 81.72 & 64 / 133 \\
DET & 81 & 86.95 & 92 & 83.51 & 72 / 133 \\
HOU & 81 & 86.24 & 92 & 85.14 & 71 / 133 \\
KCR & 78 & 83.09 & 89 & 78.69 & 69 / 133 \\
LAA & 67 & 72.93 & 78 & 77.8 & 59 / 133 \\
LAD & 84 & 89.44 & 95 & 86.26 & 74 / 133 \\
MIA & 76 & 81.66 & 87 & 81.91 & 67 / 133 \\
MIL & 65 & 70.13 & 76 & 73.34 & 57 / 133 \\
MIN & 56 & 61.98 & 68 & 70.1 & 49 / 132 \\
NYM & 78 & 83.61 & 89 & 81.58 & 69 / 133 \\
NYY & 78 & 83.86 & 90 & 80.28 & 69 / 132 \\
OAK & 64 & 69.88 & 75 & 71.92 & 57 / 133 \\
PHI & 67 & 72.42 & 78 & 69.38 & 60 / 133 \\
PIT & 77 & 82.37 & 88 & 80.33 & 67 / 131 \\
SDP & 63 & 68.73 & 74 & 74.14 & 55 / 132 \\
SEA & 77 & 82.55 & 88 & 81.3 & 68 / 133 \\
SFG & 82 & 88 & 94 & 86.39 & 72 / 132 \\
STL & 81 & 86.25 & 92 & 87.69 & 70 / 132 \\
TBR & 65 & 70.72 & 76 & 79.47 & 56 / 132 \\
TEX & 89 & 94.45 & 100 & 83.61 & 80 / 134 \\
TOR & 86 & 91.93 & 97 & 89 & 76 / 133 \\
WSN & 89 & 94.82 & 100 & 94.01 & 78 / 133 \\
 \hline\end{array}

These quantiles are based off of a distribution - I've uploaded a picture of each team's distribution to imgur. The bars in red are the win total values covered by the 95% interval. The blue line represents my estimate of the team's "True Win Total" based on its performance - so if the blue line is to the left of the peak, the team is predicted to finish "lucky" - more wins than would be expected based on their talent level - and if the blue line is to the right of the peak, the team is predicted to finish "unlucky" - fewer wins that would be expected based on their talent level.

It's still difficult to predict final win totals even at the beginning of September - intervals have a width of approximately 11-12 games. The Texas Ranges have been lucky this season, with a projected win total over 10 games larger than their estimated true talent level!  Conversely, the Tampa Bay Rays have been unlucky, with a projected win total 10 games lower than their true talent level.

The Chicago Cubs have a good chance at winning 105+ games. My system believes they are a "true" 101 win team. Conversely, the system believes that the worst team is the Atlanta Braves, which are a "true" 68 win team (though the Minnesota Twins are projected to have the worst record at 62 wins).



Terminology




To explain the difference between "Mean" and "True Win Total"  - imagine flipping a fair coin 10 times. The number of heads you expect is 5 - this is what I have called "True Win Total," representing my best guess at the true ability of the team over 162 games. However, if you pause halfway through and note that in the first 5 flips there were 4 heads, the predicted total number of heads becomes $4 + 0.5(5) = 6.5$ - this is what I have called "Mean", representing the expected number of wins based on true ability over the remaining schedule added to the current number of wins (from the beginning of the season until the all-star break).