The Most Valuable Player Award (MVP) is an annual National Basketball Association (NBA) award given to the top-performing player of the regular season since the 1955-56 season. I use historical data surrounding an NBA player's basketball season to predict the last 5 MVPs (2018-2022). My regression model correctly predicted all MVPs! But what are the stats (features) that have allowed my model to perform so well?
The following picture shows all the work steps that are carried out. I usually combine these steps in a fully automated pipeline, but since this is a side project and my free time is limited, the pipeline is split into 3 files that are executed sequentially.
- Parse selected Basketball-Reference (Website) pages and save all relevant pages in html-format.
- Basketball-Reference
- Aggregate the data from the html pages and upload it to my MongoDB Cloud account.
- Predict the last 5 (2018-2022) NBA MVPs with Machine Learning.
- PowerBI file with a three charts, all three are featured in the 'nba_ml.ipynb' file.