This is a data wrangling and analysis project where the main objective was to parse data from multiple large JSON files in usable format for further analysis. In this project, several tools were developed and applied on datasets related to the Spanish League for demonstration purpose. The resulting datasets can be used for further exploratory analysis, clustering, machine learning, and developing statistical models.
These datasets contain information about events in every soccer match that took place in leagues (2017-18) in England, Germany, Spain, France, Italy, UEFA Champions League, and the 2018 World Cup.
The original datasets are available in JSON and csv formats in here: https://springernature.figshare.com/articles/Metadata_record_for_A_public_data_set_of_spatio-temporal_match_events_in_soccer_competitions/9711164
Relevant paper could be found from here: https://www.nature.com/articles/s41597-019-0247-7
The file utils_football.py contains the tools that can be used for data wrangling. Both the utils_football.py and the main files contain comments to facilitate easy understanding.
A walkthrough of the data wrangling part of the project can be found at https://marufsazed.medium.com/data-wrangling-project-with-python-eee40b460fed