Combine classification algorithms to predict the winner of each game
Kyle Johnson
Being able to predict the future, even just slightly better than a coin flip, can be enourmously lucrative. Without having a crystal ball, the next best thing that we can do is harness the power of large datasets to find hidden patterns that can be used to give a slight edge in making large amounts of predictions. Baseball is perfectly suited for this because virtually everything that happens is quantifiable and repeats hundreds of times per game and each game is repeated thousands of times per year. The goal of this project is to use machine learning techniques to make predictions about Major League Basebal games in such a way that is better than the Vegas book makers. Being able to predict 70% of games correct is of no use if Vegas also predicted those same games correctly; in order to have a useful model, I must create one that consistently makes money when betting against Vegas bookmakers.
Please see the notebook titled "Summary_Start_Here" for a detailed road map through this project in order fully understand the process.
-I was able to create a model that predicts MLB games more accurately and more profitably than the Vegas odds in a statistically significant way. I did this by querying data from several online baseball databases and then optimizing several different classification models, before combining them to vote on the outcome of each game.
-Oddly enough, it seems that always betting with the Vegas odds is a profitable strategy but using the model created in this project is potentially almose twice as profitable.
For further exploration, I would use more types of data (new and highly advanced statistics) and more games from previous seasons. I would also automate the process of gathering the necessary data for today's games and publishing a report of which games to bet on.