Many fans make assumptions about certain aspects of baseball. Are younger players better than older players? Were the Astros really the best team? Does my favorite player really deserve to be paid so much? Does Yuli Gurriel have the best or worst hair in the MLB?
Sports writer Jim Murray once said, “Baseball’s appeal is decimal points. No other sport relies as totally on continuity, statistics, orderliness of these. Baseball fans pay more attention to numbers than CPAs.” What makes these decimal points so interesting is all the different ways for fans to analyze and understand them.
1,627 players recorded at least one statistic in the 2017 season. Jose Altuve led the MLB with a .346 batting average. Altuve actually recorded 9 fewer hits than Rockies outfielder Charlie Blackmon however. 29 players also actually had a higher batting average than Altuve, including 8 players with a perfect 1.000 batting average, but they failed to qualify for recognition according to the MLB’s award requirement of a minimum average of 3.1 plate appearances per game (so at least 503 PA throughout the season). Of the 8 perfect players, only one was a position player (Eric Kratz), and ironically he’s also the only one of the 8 who has more than one at bat (don’t jump to conclusions – he only had 2). Each and every stat can be understood in different ways to fit a different argument. With this in mind, I tried to take biased opinions and explain them with unbiased data.
Did middle-aged players have higher offensive WARs than younger and older players? Courtesy of the Boston Globe, one can visually see that of all elite players from 1984 to 2014 (elite defined as players with a WAR greater than 2.0 for that season), middle aged players are in their prime and the most valuable to their team.
This poses the question that for all players in 2017, not simply those with a WAR greater than 2.0, are the middle-aged players more valuable to their teams than younger and older players? It makes sense – middle aged players receive more valuable contracts as they’ve proved their abilities longer. Rookies and younger players don’t have the same MLB experience, and older players are often considered to be “past their prime” with only a few years before they’re replaced by a younger starter (although David Ortiz would like to have a word with anyone who thinks his final season, leading the league in doubles, slugging percentage, and OPS, would be a testament of older players’ values). To test this theory, I performed a two sample hypothesis test. Group 1 consisted of players aged 21 to 25 and 33 to 43, while Group 2 consisted of the hypothesized “players in their prime,” between the ages of 26 and 32.
With a P-value of greater than .05, there is not statistically significant evidence that players between the ages of 26 to 32 outperform their younger and older counterparts. The results aren’t too insignificant for sports however. With a P-value of .1296, Group 2’s ability to outperform Group 1 can be better explained using WAR than it can be using OPS-adjusted, or OPS+. Performing the same exact two sided T-test with OPS+ substituted for WAR, it results in a P-value of .3782. In layman’s terms, what this all means is that the claim that middle aged players, supposedly in their “primes,” cannot be proved as their remains a 12.96% chance that the results of the test are due to chance (to parallel this, there’s an 87.04% chance that the data can prove that middle aged players are better than younger and older players). These are better odds that WAR is able to explain this than the odds that OPS+ can, so while neither stat is perfect, WAR is a better-used argument that OPS+.
Well that was a lot of information just to say “No – middle aged players aren’t necessarily better than younger and older aged players.” As infielder Toby Harrah puts it, “Statistics and bikinis both show a lot, but not everything.”
How about analyzing the use of designated hitters in the MLB? The American League uses DHs every single game, while the National League cannot use a DH. Assuming that pitchers are terrible at hitting (and most are), one would expect the mean batting average of the American League to be greater than the mean batting average of the National League, since the National League is hurt by the pitchers hitting in games.
The two sample T test shows that there is no statistically significant evidence that the American League has a better batting average than the National League, despite the use of DHs vs pitchers in the batting order. However, the distribution of batting averages in each league are not normal, so I performed a nonparametric test as well to ensure our initial results were not affected by the two distributions of the data.
The Mann-Whitney test compares the medians of the two data sets, rather than the means, and is regarded as much more efficient at testing for independence compared to a T-Test on distributions that are not normal or distributions that are extremely large. The results of performing the Mann-Whitney test confirm that there is no statistically significant evidence that American League teams hit better than National League teams.
Perhaps the most intriguing result of performing these tests is the fact that the differences in mean and median batting averages between the two leagues are not statistically significant, but the National League actually has a better mean and median batting average than the American League. Could this mean that if we removed designated hitters, players whose only job in baseball is to hit, and pitchers from the data, reanalyzing the data would tell us that position players in the National League are much better hitters than position players in the American League?
What about the value of someone like Giancarlo Stanton on your team rather than someone like Dee Gordon (ironically the pair were on the same team last season). Stanton led the league in home runs with 59, while Gordon led the majors in steals. In order to win games, teams obviously need to score more runs than their opponents. Looking at the relationship between runs vs homers and runs vs steals shows that home runs have a much greater value to a teams total runs than total steals do.
2017 Runs vs Home Runs
2017 Runs vs Stolen Bases
Recently, teams have been relying more and more on power to supply wins. 2017 broke the record for most total home runs in a season with 6,105. The previous record was the 2000 season, during the height of the steroid era, with 5,693 total home runs. The difference between the first and second highest total home run seasons is the same difference between the second and seventh seasons. 2017 was clearly a display of the direction the league is heading – power over speed. As a matter of fact, the slope of home runs per game since the 1980 season is nearly the exact inverse of the slope of stolen bases per game since 1980.
Home Runs per Game per Season since 1980
Stolen Bases per Game per Season since 1980
How about your 2017 World Series champions – the Houston Astros. Given the duration of the series and ability of any small mistake to sway a baseball game, the MLB playoffs can really go any way. Since the wild card was first introduced in 1994, the World Series winner has been a wild card 6 times. The best regular season record ever was tied by the 2002 Mariners at 116-46, but they exited in the first round of the playoffs (and haven’t made the playoffs since). The best team’s don’t always end up on top. 2017 was actually fairly predictable based on expected success, however. Using the SRS statistic (the number of runs per game a team is better than the average MLB team), it was clear that almost every American League team was stronger than almost every National League team.
Even with the league’s best record, the Dodgers were barely better than the Red Sox, the fourth best team according to SRS in the American League. The Dodgers’ near-collapse towards the end of the regular season can explain their weaker SRS, as well as a very easy strength of schedule. Both Cleveland and the Yankees finished the season with strong win streaks, inflating their value towards the postseason. For the case of the Astros, they probably had the best trade deadline, so while they maintained control of a weaker division, their SRS did not show just how much stronger they were offensively and defensively against teams towards the end of the season. That being said, being only .3 average runs behind the offensive powerhouse that was Cleveland is still very impressive.
At the end of the day, predicting the outcome of anything in baseball is usually never in your favor. It’s a lot more fun to look back and analyze even the most obscure stats, such as the frequency of Yuli Gurriel hair jokes his teammates make per game. On the bright side, pitchers and catchers report to Spring Training in under a month.