by Matthew de Marte – April 29th, 2018
For two years in college, I was fortunate enough to play under an assistant coach who spent time playing in the Philadelphia Phillies’ organization. While back visiting Babson a few weeks ago to throw with a few of the teams’ pitchers, we started talking about baseball analytics. He was a great pitching coach at Babson, and during his time with the team, he played a huge role in the development of all our pitchers. However, while doing so, he was reluctant to embrace the baseball analytics and the new player development methods that have emerged. During my discussion with him, he stressed to me the things in which he believed analytics cannot tell you about baseball, and why he believed some statcast metrics, such as exit velocity, are not applicable to in-game situations.
This made me realize something; with the way baseball analytics is talked about, a lot of the principles that shape them are not fully explained. Every point my coach brought up to me was a valid question as to why someone would be skeptical to embrace analytics. What is not understood is that sabermetrics are not meant to reinvent the wheel, but rather, their purpose is to improve how we look at and use the wheel. To me, baseball analytics means using data and statistics to gain the best advantage possible, which ultimately will improve your team’s chances of winning. The applications of analytics are endless. Player acquisition, the draft, player tendencies, scouting, player evaluation, player development, in-game tactics, and in-game strategy are several of the realms where baseball analytics can be used. Everything evolves, and sabermetrics have, and will continue to, improve how we view and analyze baseball.
One concept that my former coach did not agree with was how hitting coaches use exit velocity as a means to evaluate the development and improvement of a hitter. Traditionally, a stat like hits would be used to determine what makes a good or bad hitter. However, it’s been proven that exit velocity shows a pathway for production at the plate, which ultimately results in better overall production. Exit velocity is not about stepping in the box and trying to hit every ball as hard as you possibly can, but rather it is about understanding that the harder you hit the ball, the more likely you are to have the best results. Instead of trying to find a hole, or hope you hit a line drive or hit a gap or home run by accident, train to be able to hit the ball hard enough to achieve the best outcomes. If a player becomes stronger in the weight room and develops a swing that produces high exit velocities, that gives the player a better chance to succeed. A player’s approach does not really change if he buys into exit velo/ launch angle, but his desired outcome does. If a swing is tailored to hit the ball hard and in the air, then a hitter can approach his at-bat like he always has. Having a good approach, paired with the work put in to be able to hit balls at optimal exit velocities and launch, should lead to more success. This thought process can be applied to many other analytical concepts as well.
Using analytical stats such as WAR, wOBA, and FIP over more traditional stats like batting average, RBIs, and pitching wins follows a similar train of thought. RBI’s, BA, and wins tell a story, but not the full story, not even close. When using stats to evaluate a player and his performance, the goal is to figure out how good that player is. In doing so, you want to focus on the metrics a player has full control over, and that correlates well to a team scoring more runs. Batting average and RBIs do not correlate nearly as well to a player’s production to a team as OBP, OPS, wOBA, wRC+ does. Don’t believe me? Let us run a test to find out. I will use Rstudio to create a linear regression model to identify what metrics correlate best to the number of runs a team has scored using four variables: batting average, on-base %, wOBA, and wRC+. wOBA is a metric that gives a specific weight to each method of reaching base and is on the same scale as OBP. wRC+ is a similar concept to wOBA. League average is 100. 1% better than league average is 101, and 1% below is 99. wRC+ is adjusted for park factors. When using variable selection to build this model, batting average and wRC+ were deemed insignificant to determine a team’s run scored. When running the actual model, these are the results:
If you don’t know how linear regression works from the statcast article, you can read about it here. Variables must have a p-value below .05 to be deemed significant. Both of these are significant. The R-squared of this was 89.94%! That is incredibly high, meaning 89.94% of the variance of a team’s runs scored can be explained by their OBP and wOBA. For comparison’s sake, I ran a linear regression model with one predictive variable as batting average to team runs, and another model with only OBP predicting team runs. The R-squared value for batting average was 53.45%. For OBP it was 82.31%! See the difference. Batting average tells a story, but more advanced metrics tell a better story.
The same thing can be said about player RBIs, though the same math cannot be used. RBIs are more of a team metric. Players on teams surrounded with other good hitters who are on base often, tend to have more RBIs, which makes sense. If a player gets more opportunities to hit with runners on base, that player should drive in more runs. Joey Votto, who is one of the best hitters in baseball, has been scrutinized at points in his career for not driving in enough runs. Votto has a career OPS of .969, which is not too shabby considering only 14 qualified players in baseball history have done better. Yet, some fans, analysts, and writers say he needs to drive in more runs. During his career with runners on base, his OPS is elevated to 1.026, and with runners in scoring position, to 1.080. Now, not every player performs better in these situations, but in Votto’s case, it is not his fault if does not drive in 120 runs a season. Instead, his teammates are at fault for not being on-base enough for him. The focal point of a player’s production should not focus on the runs he drives in, but rather how often he can get on-base and give the rest of his team a chance to produce runs. An RBI, unless it’s a via a home run, needs another player to reach base safely to be accumulated. So, we should, and the sabermetric community has for a long time now, glorify those with the ability to get on-base, not drive in runners.
Analytics are not meant to reinvent the wheel. Instead, they’re meant to better utilize it. I understand analytics are not always going to work. Baseball is unpredictable, things happen, and human error is hard to account for in statistics. The goal with analytics is to be as prepared as possible and give a team or player the best chance of succeeding. Analytics can help any player, coach, or fan better understand baseball, improve a players development, and how coach’s decision making. Denying this is the denial of the truth mathematics provides.
The conversation I had with my old coach ended with him telling me I should write something about the application of analytics, and the confusion of their use. The goal of analytics is not to disregard baseball’s past, but to continue to push the culture forward. The Moneyball Era of baseball has accelerated our understanding and knowledge of baseball as a whole. I hope this piece has helped those reading understand why you should adopt an analytical viewpoint and embrace sabermetrics. If anyone would like to further this discussion, feel free to reach out to me at firstname.lastname@example.org!