Welcome to the Football Analytics project! This repository showcases a comprehensive four-phase journey aimed at uncovering the hidden insights within football data. Through meticulous data scraping from the Transfermarkt website and subsequent analyses, we present a fresh perspective on the game we all love.
In the initial phase, we harnessed the power of Python's libraries, specifically Beautiful Soup 4 (BS4) and Selenium, to meticulously scrape valuable data from the top five European leagues: Spain, Germany, Italy, France, and England. The following dataset was collected:
Club Data
├── Big Five League from season 15/21
├── All Clubs
├── All players and their positions
├── All Rankings
├── All Squads
├── All total and average market values
├── All Ages and average ages
├── All Stadiums and their capacity
├── All coaches
├── All club victories and prizes
├── All club income | expenditure | OverallBalance
└── All foreign players
Players Data
├── Player Name
├── Player Full Name
├── Player ID
├── Player Shirt Number
├── Date of Birth
├── Citizenship
├── Place Of Birth
├── Caps
├── Goals
├── Other Positions
├── Foot
├── Outfitter
├── Agent
├── Contract Joined
├── Contract Expires
├── Date Of Last Contract
├── Height
├── Current Club
├── All Players Transfer Data (Season / Date / Market Value / Fee / Left / Joined)
├── All Players Stats
│ ├── Appearances
│ ├── Goals In Each Season
│ ├── Assists
│ ├── Yellow Card | Second Yellow Card | Red Card
│ ├── Minutes Played
│ └── Goals Conceded | Clean Sheets
└── ...
During this phase, we began by developing an ER diagram to guide the creation of a structured MySQL database. After meticulous data cleaning, we seamlessly utilized SQL Alchemy to establish the database, ensuring a robust foundation for our analysis.
In this phase, we leveraged statistical analysis to uncover insights and address pertinent questions related to the collected data. Below are some key inquiries we addressed:
- Player Participation Analysis in the 2021-2022 Season: Distribution of Match Appearances and Percentage of Involvement
- Exploring the Relationship Between Goals Scored and Estimated Player Value: A Linear Regression Analysis Using 2021-2022 Season Data
- Analyzing the Relationship Between Goals Scored and Estimated Market Value for Strikers in the 2021-2022 Season Using Linear Regression
- Exploring Estimated Player Prices Distribution by Position for the 2021-2022 Season Data
- Goal Scoring Analysis Across Different Leagues in the 2021-2022 Season
- Player Acquisition Costs Analysis across Seasons 2017-2018 to 2021-2022 in Football Leagues
- Discrepancy Between Player Transfer Fees and Actual Values in Football Industry: A Comparative Analysis
- Identifying Players with Performance in the Top 30% but Market Value in the Bottom 40%
- Comparing Performance Distribution of Players Obtained in the previous parts with the Overall Player Population
- Comparing Distribution of Players' Positions Obtained in the previous parts with the Overall Player Community
- Identifying Underperforming Players in Top 5 European Leagues Based on Performance Metrics
- Performance Comparison of Experienced and Young Football Players After Transfers to New Teams
- Performance Comparison of Teams in UEFA Champions League and Domestic Leagues
In this pivotal phase, we harnessed the power of machine learning techniques to tackle three critical questions:
- Predicting Player Market Value
- Player Post Classification
- Player Similarity Clustering