Skip to main content

Owen Crandall Master's Thesis

Statistics & Data Science Master's Thesis

When

9 – 11 a.m., April 24, 2026

Title: The Applications of Markerless Motion Capture Data for Analyzing Swing Results in Baseball

Abstract: Biomechanical data has gained a significant foothold in the game of baseball over the 
past several years. From college baseball to the MLB, organizations have been utilizing in-game 
markerless motion capture technology to better understand the movement trends of their hitters 
and pitchers. These systems track hundreds of data points, providing information on the 
movement of each part of the body as it throws or swings at a baseball. Importantly, they gather 
this data through a network of high-speed cameras, rather than wearable sensors that could 
hinder movement. As teams have implemented this data into their player development systems, 
its applications for hitting have lagged behind that of pitching. Pitching tends to be less dynamic 
and more focused on repeated movements than hitting, which requires more adaptability for 
different game situations and pitch types and locations. Therefore, biomechanical data lends 
itself more naturally to pitching, as it can identify outlier movements and point out when a 
pitcher’s mechanics have strayed too far from the desired movement. Given that hitting 
necessitates a different approach, equally applying biomechanical data to its development has 
proved difficult.
In this thesis, we investigate the applications of markerless biomechanical data to hitting, 
in an effort to bridge the applicational gap between pitching and hitting. There are three swing 
results we have targeted: whiff rate, exit velocity, and bat speed. These metrics should 
encapsulate the quality of any given swing. For each target variable, a similar approach is used. A 
relatively simple model is established as a baseline (logistic for the binary whiff result and MLR 
for bat speed and exit velocity). The model is evaluated and a series of modifications take place. 
Multicollinearity is checked and accounted for by model reduction methods. A “player” variable 
is implemented as both a categorical variable and as part of a random effects model. More 
advanced model structures are then implemented and evaluated. This includes SVM and random 
forest for whiff and Lasso and XGBoost for exit velocity/bat speed. Finally, key metric means 
and variances for each player are examined, along with their correlations between the target 
outcome variables. For use as predictors, we have almost 300 biomechanical output variables. 
The data consists of KinaTrax biomechanical motion-capture data from the 2025 University of 
Arizona Baseball season, with roughly 1,500 total observations from the team’s top nine hitters.
For future work, expanding the data set would be helpful. Only having data from nine 
players is somewhat limiting, so increasing that number would make the results of the thesis 
more applicable to a wider range of players. It would also be intriguing to examine the 
biomechanical differences between players at different levels (i.e. college, minor leagues, MLB)