Skip to content

jabrantley/Baseball_Notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a collection of notebooks related to baseball analysis. Some are more about the techniques and others are about interesting baseball stuff. The first several notebooks have to do with predicting on-base percentage (OBP). If there is one metric that has persisted into the post-moneyball era into today's statcast era, its the value of OBP. The rest are just things that come to mind and I try them. Some originate from interest in the baseball question and some of them are because I wanted to build a implement a specific model or technique.

NOTE: Some of these require very large Statcast files. The code is usually set to load the data by default, so if there is an error there, change to True and the data will be gathered and saved locally.

Notebook Description
01_PredictingOBP-ML.ipynb Prediciting end of season OBP given early season data. Focuses on regression and simple ML techniques
02_PredictingOBP-EmpericalBayes.ipynb This adapts code from my Emperical Bayes repo to estimate OBP using the same technique used for batting average. Does a shrunken estimate that accounts for plate appearances approximately estimate end of season OBP?
03_PredictingOBP-ARIMA-Forecasting-basic.ipynb Instead of a classic train/test split using ML, we use a running OBP to try to forecast out to the end of season. This notebook only considers using past OBP to predict future OBP w/o any additional exogenous variables
04__PredictingOBP-ARIMA-Forecasting-addExog.ipynb Similar to the last notebook but we introduce exogenous variables to facilitate the forecasting.
05_OBP_to_SLG.ipynb OPS is a common metric but it is often critized since the denominators of on-base % and slugging % are different, making the addition mathematically... eh. My question is, what is the relationship between the two? How much is 1 point of OBP worth compared to 1 point of slugging %?
06_TheBook-Chapter1.ipynb This notebook replicates some of the tables in Chapter 1 of "The Book" by Tom Tango et al. The data were mined from Baseball Savant and queried using Pandas to make the RE24 table, compute wOBA, and other tables.
07_CricketData.ipynb A notebook that plays with some cricket data.
08_EstimatingTrueExitVelocity.ipynb Given some noisy data, how can we estimate a player's true average exit velocity. Especially when the number of plate appearances varies signficantly across players. We use linear models, empericial Bayes, and a hierarchical Bayesian model to address this.
09_PredictingSwingAndMiss.ipynb This is a simple notebook to see if we can predict a swing and miss from the data. In all honestly it's not the greatest question to address, but it is still interesting. A more interesting question might a continuous variable, like exit velocity, or something more fundamental like the "error", which may be a combination of several parameters. For now it is just an excuse to build a multinomial logistic regression model, use a neural network, and try to implement the same multinomial logistic regression model in a Bayesian framework.
10_PredictingBallInPlay.ipynb This is similar to the previous notebook, but instead of predicting swing and miss, we are interested in knowing if the ball is put in play. This is also a bit naive, but makes for a straightforward problem. It is also an excuse to build a logistic regression model in PyMC and also to try to build a BART model for the same problem.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published