16 July, 2016

2016 Win Total Predictions (Through All-Star Break)


These predictions are based on my own silly estimator, which I know can be improved with some effort on my part.  There's some work related to this estimator that I'm trying to get published academically, so I won't talk about the technical details yet (not that they're particularly mind-blowing anyway). These predictions include all games played before the all-star break.

I set the nominal coverage at 95% (meaning the way I calculated it the intervals should get it right 95% of the time), but based on tests of earlier seasons at this point in the season the actual coverage is just under 93%, with intervals usually being one game off if and when they are off.

Intervals are inclusive. All win totals assume a 162 game schedule.

\begin{array} {c c c c} 
\textrm{Team}  & \textrm{Lower}  & \textrm{Mean} & \textrm{Upper} & \textrm{True Win Total}  & \textrm{Current Wins/Games}\\ \hline

ARI & 62 & 72.11 & 82 & 76.75 & 38 / 90 \\
ATL & 52 & 61.82 & 72 & 68.4 & 31 / 89 \\
BAL & 81 & 90.93 & 101 & 86.25 & 51 / 87 \\
BOS & 80 & 90.3 & 100 & 89.19 & 49 / 87 \\
CHC & 87 & 96.9 & 106 & 96.11 & 53 / 88 \\
CHW & 71 & 81.04 & 91 & 80 & 44 / 87 \\
CIN & 51 & 60.62 & 70 & 63.51 & 32 / 89 \\
CLE & 84 & 93.41 & 103 & 90.67 & 52 / 88 \\
COL & 66 & 76.18 & 86 & 79.22 & 40 / 88 \\
DET & 73 & 82.55 & 92 & 81.11 & 46 / 89 \\
HOU & 76 & 85.81 & 96 & 83.9 & 48 / 89 \\
KCR & 70 & 80.3 & 90 & 77.29 & 45 / 88 \\
LAA & 62 & 71.88 & 82 & 77.4 & 37 / 89 \\
LAD & 80 & 89.43 & 99 & 87.7 & 51 / 91 \\
MIA & 75 & 84.5 & 94 & 82.1 & 47 / 88 \\
MIL & 61 & 71.33 & 81 & 71.98 & 38 / 87 \\
MIN & 56 & 65.83 & 76 & 73.06 & 32 / 87 \\
NYM & 75 & 84.9 & 95 & 82.97 & 47 / 88 \\
NYY & 69 & 78.88 & 89 & 76.38 & 44 / 88 \\
OAK & 61 & 70.93 & 81 & 73.08 & 38 / 89 \\
PHI & 64 & 74 & 84 & 72 & 42 / 90 \\
PIT & 73 & 82.71 & 93 & 81.45 & 46 / 89 \\
SDP & 62 & 72.04 & 82 & 75.53 & 38 / 89 \\
SEA & 74 & 83.44 & 93 & 85.31 & 45 / 89 \\
SFG & 87 & 96.8 & 106 & 89.55 & 57 / 90 \\
STL & 77 & 87.13 & 97 & 90.03 & 46 / 88 \\
TBR & 58 & 67.3 & 77 & 72.91 & 34 / 88 \\
TEX & 81 & 91.22 & 101 & 83.75 & 54 / 90 \\
TOR & 80 & 89.42 & 99 & 87.66 & 51 / 91 \\
WSN & 86 & 95.42 & 105 & 93.21 & 54 / 90 \\    \hline\end{array}
It's still fairly difficult to predict final win totals even a little over halfway through the season - intervals have a width of approximately 20 games. A few stand-out points - the teams that are predicted to definitely finish below 0.500 are the Atlanta Braves, the Cincinnati Reds, the Minnesota Twins, and the Tampa Bay Rays, with the Reds being the worst of those teams (they are an estimated as a "true" 63.51 win team). On the other side, the teams predicted to definitely finish above 0.500 are the Chicago Cubs, the Cleveland Indians, the San Francisco Giants, and the Washington Nationals, with the Cubs being the best of these teams (they are estimated as a "true" 96.11 win team). The Texas Rangers and San Francisco Giants in particular have been an exceptionally lucky team - they are predicted to win approximately 7 more games than their "true" win total. Likewise, the Atlanta Braves and Minnesota Twins have been unlucky, both predicted to win approximately 7 fewer games than their "true" win total.

To explain the difference between "Mean" and "True Win Total"  - imagine flipping a fair coin 10 times. The number of heads you expect is 5 - this is what I have called "True Win Total," representing my best guess at the true ability of the team over 162 games. However, if you pause halfway through and note that in the first 5 flips there were 4 heads, the predicted total number of heads becomes $4 + 0.5(5) = 6.5$ - this is what I have called "Mean", representing the expected number of wins based on true ability over the remaining schedule added to the current number of wins (from the beginning of the season until the all-star break).

These quantiles are based off of a distribution - I've uploaded a picture of each team's distribution to imgur. The bars in red are the win total values covered by the 95% interval. The blue line represents my estimate of the team's "True Win Total" based on its performance - so if the blue line is to the left of the peak, the team is predicted to finish "lucky" - more wins than would be expected based on their talent level - and if the blue line is to the right of the peak, the team is predicted to finish "unlucky" - fewer wins that would be expected based on their talent level.