Next-Gen Baseball Analytics: From Sabermetrics to Artificial Intelligence
"I think AI is going to, at some point, take all of this data and make it even more usable. And simplify that process... We'll be able to plug in ‘Here's what this guy does, here's the model of what we have, you tell me what pitch to throw in this situation.' Alright, boom, that pitch happened. 'What's the next pitch that should happen based on all of that data?' I think it's kind of like the self-driving cars. Are we ever going to get to the point where you're able to sleep in your car? I don't know. I think we'll still have to have some hands on the wheel with AI doing that. And there still has to be some instinctual stuff. But I think having AI be able to streamline and simplify is probably where we're headed."
​
— Andrew Checketts
Head Coach, UCSB Baseball


Sam Arenson: Senior Intern with UCSB Baseball Analytics
​
Max Farrell: Associate Professor of Economics and Machine Learning at UCSB
(Tony Mastres)
(Jeff Liang)
Is AI Analytics a Grand Slam?
How artificial intelligence fits into the future of sabermetrics in Major League Baseball and at UC Santa Barbara
Over its 43-year history, sabermetrics has fundamentally transformed player scouting, team evaluation, and strategic decision-making in baseball. Now, as the sport enters the contemporary era, two burgeoning technologies, artificial intelligence (AI), and machine learning, are beginning to reshape sports analytics.
​
Max Farrell, associate professor of economics and machine learning at UC Santa Barbara, described how these technologies provide a quicker and more nuanced approach to data analysis. “It gives us a more nuanced way to match people up based on their past data," says Farell. "Through machine learning and artificial intelligence, an analyst could easily input large amounts of data to match people up more closely or find groups of athletes that behave the same way to come to conclusions and predict how an individual player might perform."
​
Machine learning and artificial intelligence, while similar, are not synonymous. Machine learning, a subset of artificial intelligence, equips machines with the ability to learn autonomously from data and past experiences, identifying patterns and making predictions with minimal human intervention. In contrast, artificial intelligence encompasses the broader concept of a computer system's capability to perform complex tasks historically reserved for humans, such as reasoning, decision-making, and problem-solving.

Aaron Parker (left) and Jonathan Mendez (right)
(Jeff Liang)
Farrell provided an exemplary illustration of how this distinction might manifest in a real-world baseball scenario, contributing to personnel or other player decisions. “Artificial intelligence could be leveraged to revolutionize machine learning in sports, especially when you think about trying to answer ultra-specific questions like, ‘is this trade good’ or ‘what relief pitcher should we bring in’,” he explained. “In this hypothetical scenario, a manager could utilize an AI service like ‘chat baseball’ to pose such questions," he said. "While a machine learning program might offer a straightforward player recommendation, AI would transcend mere data-driven suggestions and provide a comprehensive explanation.”
​
These systems are currently in use at the professional league level. In 2015, all 30 Major League Baseball teams adopted a statistical model called Statcast. This high-accuracy, automated tool employs machine learning algorithms and artificial intelligence to analyze player performance data in real-time, including metrics like pitch velocity, defensive positioning, and pitch selection. Since its introduction, Statcast has not only provided fans with valuable insights but has also influenced player training strategies and game plans in real time.
​

Matt Ager (Jeff Liang)
Still, at UCSB the Gauchos' analytics team continues to rely on a blend of traditional methods and cutting-edge technology to gain insights into the team's performance. Utilizing popular programming languages such as Python, R, and Excel, the team employs descriptive statistical tools to unravel the intricacies of the game on the UCSB campus.
Sam Arenson, an intern with the analytics team, says these programs play a role in achieving the team's objectives. "Python and R are both coding languages that we use for analytics," he explained. "While they serve similar purposes, they excel in different areas such as web scraping and data visualization. For instance, with R, you can load a dataset, analyze it, extract averages, manipulate the data, and create visualizations and graphs to present findings in reports for the coaching staff."
​
With the team's current technological setup, using artificial intelligence or machine learning on the UCSB campus would be difficult, according to Arenson. “We tried to make a machine learning model earlier in the year. But with our technology, doing that is extremely difficult,” he said. “Machine learning is trying to predict the future based on past data. So, we aren't really able to do that because we don’t always have all of the data that we need, and if we can’t guarantee that a machine learning model is going to predict the future, it’s not really useful. So what I would say most of our work is, is descriptive statistics.”
​
The potential limitations of AI and machine learning at UCSB also resonate in Major League Baseball, where even Statcast has faced criticism for inadequacies in tracking and accounting for ballpark variations. In 2017, concerns arose when the computer program struggled to track balls with "atypical trajectories," accounting for over 10 percent of all balls in play. This discovery prompted doubts about the program's effectiveness.

Spencer Erdman (left) and Aaron Parker (right)
(Jeff Liang)
While technology has progressed immensely since the early 1980s when sabermetrics first provided an exciting new tools in the Major Leagues, humans have and always will be part of the team that analyzes baseball to create the best outcomes.
​
Arenson stresses the value of artificial intelligence and machine learning in baseball stats, but is also keenly aware of the technology's limits. “Baseball is a very human game, and you can’t predict everything with mathematics. There is a limit to predicting everything a player will do in a game,” he emphasized. “While sabermetrics is significant in baseball, it's not going to be the end all be all. There will still be a lot of human input that goes hand in hand with these data models and future machine learning methods.”