Statistics & Data Science Master's Thesis
When
9 – 11 a.m., April 24, 2026
Where
Title: The Applications of Markerless Motion Capture Data for Analyzing Swing Results in Baseball
Abstract: Biomechanical data has gained a significant foothold in the game of baseball over the
past several years. From college baseball to the MLB, organizations have been utilizing in-game
markerless motion capture technology to better understand the movement trends of their hitters
and pitchers. These systems track hundreds of data points, providing information on the
movement of each part of the body as it throws or swings at a baseball. Importantly, they gather
this data through a network of high-speed cameras, rather than wearable sensors that could
hinder movement. As teams have implemented this data into their player development systems,
its applications for hitting have lagged behind that of pitching. Pitching tends to be less dynamic
and more focused on repeated movements than hitting, which requires more adaptability for
different game situations and pitch types and locations. Therefore, biomechanical data lends
itself more naturally to pitching, as it can identify outlier movements and point out when a
pitcher’s mechanics have strayed too far from the desired movement. Given that hitting
necessitates a different approach, equally applying biomechanical data to its development has
proved difficult.
In this thesis, we investigate the applications of markerless biomechanical data to hitting,
in an effort to bridge the applicational gap between pitching and hitting. There are three swing
results we have targeted: whiff rate, exit velocity, and bat speed. These metrics should
encapsulate the quality of any given swing. For each target variable, a similar approach is used. A
relatively simple model is established as a baseline (logistic for the binary whiff result and MLR
for bat speed and exit velocity). The model is evaluated and a series of modifications take place.
Multicollinearity is checked and accounted for by model reduction methods. A “player” variable
is implemented as both a categorical variable and as part of a random effects model. More
advanced model structures are then implemented and evaluated. This includes SVM and random
forest for whiff and Lasso and XGBoost for exit velocity/bat speed. Finally, key metric means
and variances for each player are examined, along with their correlations between the target
outcome variables. For use as predictors, we have almost 300 biomechanical output variables.
The data consists of KinaTrax biomechanical motion-capture data from the 2025 University of
Arizona Baseball season, with roughly 1,500 total observations from the team’s top nine hitters.
For future work, expanding the data set would be helpful. Only having data from nine
players is somewhat limiting, so increasing that number would make the results of the thesis
more applicable to a wider range of players. It would also be intriguing to examine the
biomechanical differences between players at different levels (i.e. college, minor leagues, MLB)
past several years. From college baseball to the MLB, organizations have been utilizing in-game
markerless motion capture technology to better understand the movement trends of their hitters
and pitchers. These systems track hundreds of data points, providing information on the
movement of each part of the body as it throws or swings at a baseball. Importantly, they gather
this data through a network of high-speed cameras, rather than wearable sensors that could
hinder movement. As teams have implemented this data into their player development systems,
its applications for hitting have lagged behind that of pitching. Pitching tends to be less dynamic
and more focused on repeated movements than hitting, which requires more adaptability for
different game situations and pitch types and locations. Therefore, biomechanical data lends
itself more naturally to pitching, as it can identify outlier movements and point out when a
pitcher’s mechanics have strayed too far from the desired movement. Given that hitting
necessitates a different approach, equally applying biomechanical data to its development has
proved difficult.
In this thesis, we investigate the applications of markerless biomechanical data to hitting,
in an effort to bridge the applicational gap between pitching and hitting. There are three swing
results we have targeted: whiff rate, exit velocity, and bat speed. These metrics should
encapsulate the quality of any given swing. For each target variable, a similar approach is used. A
relatively simple model is established as a baseline (logistic for the binary whiff result and MLR
for bat speed and exit velocity). The model is evaluated and a series of modifications take place.
Multicollinearity is checked and accounted for by model reduction methods. A “player” variable
is implemented as both a categorical variable and as part of a random effects model. More
advanced model structures are then implemented and evaluated. This includes SVM and random
forest for whiff and Lasso and XGBoost for exit velocity/bat speed. Finally, key metric means
and variances for each player are examined, along with their correlations between the target
outcome variables. For use as predictors, we have almost 300 biomechanical output variables.
The data consists of KinaTrax biomechanical motion-capture data from the 2025 University of
Arizona Baseball season, with roughly 1,500 total observations from the team’s top nine hitters.
For future work, expanding the data set would be helpful. Only having data from nine
players is somewhat limiting, so increasing that number would make the results of the thesis
more applicable to a wider range of players. It would also be intriguing to examine the
biomechanical differences between players at different levels (i.e. college, minor leagues, MLB)