Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model 3 - Add conf champ #6

Merged
merged 10 commits into from
Mar 16, 2018
52 changes: 46 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,54 @@ and generates predictions for March Madness brackets.
More about the competition:
https://www.kaggle.com/c/mens-machine-learning-competition-2018

## Feature Engineering
- recent performance
- `power_5`
- `preseason_rank`
- `champ`
validated with `ConferenceChamps_2018.csv` using data from [cbssports.com](https://www.cbssports.com/college-basketball/news/selection-sunday-show-2018-ncaa-tournament-conference-champions-and-automatic-bids/)

## Ideas for future work

##### Feature Engineering
- regular season champion, conf coach of the year, conf player of the year
https://en.wikipedia.org/wiki/2017%E2%80%9318_NCAA_Division_I_men%27s_basketball_season#Conference_winners_and_tournaments
- regular season wins
- regular season win/loss ratio, in conf and overall
https://en.wikipedia.org/wiki/2017%E2%80%9318_NCAA_Division_I_men%27s_basketball_season#Conference_standings
- regular season average points per game
- regular season average points allowed
- tenure of coach
- prob of win without X player, based on historical play by play of games won without that player, or played very little
- distance of traveled for game
- days since last game
- [regular season upsets](https://en.wikipedia.org/wiki/2017%E2%80%9318_NCAA_Division_I_men%27s_basketball_season#Upsets)
- historical wiki data https://en.wikipedia.org/wiki/Category:NCAA_Division_I_men%27s_basketball_seasons
- nearest distance from historical tournament finalists (small school upsets)
- binary for if the team was in the championship game or final four in the past 4 years
- average number of rounds they advanced in the NCAA tournament over the past 4-6 years

##### Feature selection
- correlations / sploms of variables
- stepwise model
- feature importance (RandomForestClassifier)
- feature impact (LogisticRegression)

##### Model training
- 10,000 simulations
- XGBoost
- KNN
- Keras neural net
- Ensembles

## References:

##### Feature generation and selection:
- https://blog.coast.ai/this-is-how-i-used-machine-learning-to-accurately-predict-villanova-to-win-the-2016-march-madness-ba5c074f1583
- https://adeshpande3.github.io/Applying-Machine-Learning-to-March-Madness
- https://www.techrepublic.com/article/march-madness-5-data-sources-that-could-predict-the-2017-ncaa-championship/
- [Matt Harvey of CoastAI 2016 model](https://blog.coast.ai/this-is-how-i-used-machine-learning-to-accurately-predict-villanova-to-win-the-2016-march-madness-ba5c074f1583)
- [Adepsh Pande 2017 model](https://adeshpande3.github.io/Applying-Machine-Learning-to-March-Madness)
- [Ideas for data](https://www.techrepublic.com/article/march-madness-5-data-sources-that-could-predict-the-2017-ncaa-championship/)

##### Elo scores:
- https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/
- https://www.kaggle.com/kplauritzen/elo-ratings-in-python/notebook
- https://www.kaggle.com/lpkirwin/fivethirtyeight-elo-ratings
- [fivethirtyeight methodology](https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/)
- [Kaggle elo in python example](https://www.kaggle.com/kplauritzen/elo-ratings-in-python/notebook)
- [Kaggle elo example 2](https://www.kaggle.com/lpkirwin/fivethirtyeight-elo-ratings)
33 changes: 33 additions & 0 deletions data/ConferenceChamps_2018.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Conference,Champion,TeamName,TeamID
AAC,Cincinnati,cincinnati,1153
ACC,Virginia,virginia,1438
America East,UMBC,umbc,1420
Atlantic 10,Davidson,davidson,1172
Atlantic Sun,Lipscomb,lipscomb,1252
Big East,Villanova,villanova,1437
Big Sky,Montana,montana,1285
Big South,Radford,radford,1347
Big Ten,Michigan,michigan,1276
Big 12,Kansas,kansas,1242
Big West,Cal State Fullerton,cal state fullerton,1168
Colonial,Charleston,charleston,1158
Conference USA,Marshall,marshall,1267
Horizon League,Wright State,wright state,1460
Ivy League,Pennsylvania,pennsylvania,1335
MAAC,Iona,iona,1233
MAC,Buffalo,buffalo,1138
MEAC,North Carolina Central,north carolina central,1300
Missouri Valley,Loyola Chicago,loyola chicago,1260
Mountain West,San Diego State,san diego state,1361
Northeast,LIU Brooklyn,liu brooklyn,1254
Ohio Valley,Murray State,murray state,1293
Pac-12,Arizona,arizona,1112
Patriot,Bucknell,bucknell,1137
SEC,Kentucky,kentucky,1246
Southern,UNC Greensboro,unc greensboro,1422
Southland,Stephen F. Austin,stephen f. austin,1372
Summit,South Dakota State,south dakota state,1355
Sun Belt,Georgia State,georgia state,1209
SWAC,Texas Southern,texas southern,1411
WAC,New Mexico State,new mexico state,1308
West Coast,Gonzaga,gonzaga,1211
6 changes: 3 additions & 3 deletions join_outputs_to_slots.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

folder = 'data'
results_folder = 'results'
predictions_file = '/submission-2.csv'
slots_output_file = '/less-readable-predictions-by-slots-2.csv'
slots_readable_file = '/readable-predictions-by-slots-2.csv'
predictions_file = '/submission-3.csv'
slots_output_file = '/less-readable-predictions-by-slots-3.csv'
slots_readable_file = '/readable-predictions-by-slots-3.csv'
# list to collect all print statements to go in a results file
readable_outputs = []

Expand Down
2 changes: 1 addition & 1 deletion model.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
More about the competition:
https://www.kaggle.com/c/mens-machine-learning-competition-2018

Code adapted to 2018 data from 2017 source:
Code adapted to 2018 data from 2017 source:
https://github.com/harvitronix/kaggle-march-madness-machine-learning/blob/master/mm.py
"""
import math
Expand Down
Loading