Authors: Timotej Zaťko, Tomáš Hoffer
In this work we aimed to predict a football player's value and their game position on the FIFA 19 dataset which is publicly available on kaggle. We did complete data analysis of player's data which was followed by data preprocessing and then by training the model. We tried to use only the player's skill-based attributes (finishing, sprint speed, heading...) and physical attributes (age, height, weight...) in our models.
- Get into project root repository
- Build a docker image --
sh ./build.sh
- Run the docker container using command --
sh ./run.sh
notebooks
- contains jupyter notebooks with data analysis, preprocessing...- 01_initial_analysis.ipynb - quick & brief overview of the dataset, number of examples, collumns, data types, missing values...
- 02_preprocessing_for_analysis.ipynb - conversion of date type attributes, weight, height a the money amounts for the main analysis
- 03_analysis.ipynb - main analysis dataset analysis
- 04_preprocessing_for_prediction.ipynb - dataset preprocessing for model training (for both tasks - classification of player's position and value prediction)
- 05_model_selection_classification.ipynb - comparison of some basic classification models (also we defined our baseline model)
- 06_model_selection_regression.ipynb - comparison of some basic regression models (also we defined our baseline model)
- 07_classification_NN.ipynb - classification approach using neural network
- 09_SMOTE.ipynb - data oversampling using several methods including SMOTE and its variations
- 10_feature_selection.ipynb - comparison of several feature selection approaches
- 11_ensemble_classification.ipynb - comparison of several ensemble models for the classification task
- 12_ensemble_regression.ipynb - comparison of several ensemble models for the regression task
- 13_hyper_parameter_tuning.ipynb - hyper-parameter tunning on our best models using grid search
src
- contains helper functions and classes for preprocessing/analysis/evaluation...data
- contains FIFA 19 datasetreport
- LaTex files for final report generation, the report is written in Slovak