Using machine learning to answer Baseball-specific questions

Baseball algorithm

Given the nature of play in baseball, a large portion of injuries occur in non-contact incidents, meaning the need to measure and control the exercise load will help reduce these types of injuries.

With this in mind, Catapult’s analytics team has implemented machine learning that takes advantage of the thousands of data points obtained with its OptimEye S5 device. Using that raw data and video from professional games and trainings, Catapult has been able to develop sport-specific algorithms that quantify volume and intensity for activities such as throwing and bat swinging.

Early testing has proven an accuracy rate over 90%, with machine learning functionality allowing it to improve with the more data that is fed into it.

Reducing the Disabled List in MLB

Studies in The American Journal of Orthopedics have shown that the number of disabled list (DL) assignments and the total number of DL days has increased year on year. Among the injured players on the DL, pitchers are more commonly injured, and spend longer on the DL compared to every other position.

Catapult’s analytics team set out to quantify the movements that most commonly lead to these overuse injuries, with the theory that being able to use objective data on volume and intensity in these movements will give practitioners the power to control training loads.

The Making of a Baseball Algorithm

In order to quantify events such as pitches and bat swings, a supervised machine learning algorithm was trained to match pitches and bat swings collected during training to the readings from the OptimEye S5 device.

Specifically, Catapult built a Random Forest algorithm based on data collected from training sessions from various baseball teams, both professional and collegiate. The training data contains accelerometer and gyroscope readings for over 6,000 events of throws, bat swings, or neither, for dozens of players and various positions.

A threshold on the player's load was introduced in order to isolate the explosive events from other events such as walking. This will ensure that the algorithm picks only game-like throws and bat swings.

Around each one of these events, Catapult studied features obtained from the three-dimensional accelerometer and the three-dimensional gyroscope within a window of two seconds from the event: one second before and one second after. Examples of the features include the maximum value, the mean, and the standard deviation for the reading of the accelerometers and the gyroscopes.

Each of these features were entered for the event of interest as well as its classification as a throw, a bat swing, or neither to build a training set for the algorithm.

Results and Discussion

After being exposed to a large number of examples to cover many throws and bat swings, the algorithm was able to achieve an accuracy of over 90% in detecting throws and bat swing during a training session. Throws detected with excellent accuracy include:

  • Bullpen
  • Pitching from mound to catcher
  • Quality throws during warm up
  • Long distance throws during fielding
  • Swings in the cage or during batting practice

The conservative estimate reflects the fact that the algorithm is only set to count throws and swings that are hard enough to be game-like (i.e. “quality” throws and bat swings). Most of the errors are from soft throws at the beginning of the routine that are too soft to be counted.

After classifying a training session into swings, throws, or neither, the algorithm calculates the total load associated with each bat swing and throw. Various quantities such as the average player load or the average time between these activities, as well as the banding for each throw or bat swing, can be obtained through the interface.

Catapult’s metrics for pitching and bat swings enables baseball practitioners to control the volume and intensity of key movements that lead to overuse injuries and cost professional and collegiate teams millions of dollars each year. These metrics have the ability to dictate the future of pitcher and batter training periodisation for coaches that want better transparency into the training effect in a real-world setting.

Interested in finding out how Catapult can answer your sport-specific questions? Find out more about our performance analytics here.