Skip to content

Tactical and performance insights from UEFA Champions League data (2020–2022) using Snowflake SQL and Python visualizations.

Notifications You must be signed in to change notification settings

I-Mukhatov/uefa-soccer-insights-sql-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UEFA Champions League Analytics (2020 – 2022) with Snowflake SQL & Python

Project Overview

This project focuses on producing actionable soccer insights for a sports media agency by analyzing match data from the UEFA Champions League (2020-2022 seasons).
The goal is to uncover tactical patterns, team performance trends, and model prediction strengths by leveraging advanced SQL queries and visual storytelling techniques.

Key deliverables include:

  • Win probability model evaluation.
  • Possession dominance vs match results.
  • Shooting efficiency metrics across teams.
  • Tactical breakdown of duel dominance and match outcomes.
  • Data-driven summaries with clean, insightful visualizations.

All analyses were conducted using Snowflake SQL and Python (Pandas, Matplotlib, Seaborn) in a DataLab environment, with a clear focus on clean query design, data quality handling, and business-driven insights.


image

Project Objectives

  • Identify team performance trends based on possession, duels, and shots.
  • Evaluate predictive model accuracy based on actual match outcomes.
  • Develop reusable SQL views to optimize analytical workflows.
  • Visualize data to support data storytelling for media/reporting use cases.

Tools & Tech

  • SQL (Snowflake dialect)
  • Snowflake Cloud Data Warehouse
  • Python + Jupyter Notebooks for visualization of core insights

Datasets Overview

Schema name: SOCCER

Table Name(s): TBL_UEFA_2020 | TBL_UEFA_2021 | TBL_UEFA_2022

Note : All three tables have same column names and data types

Column Definition Data type
STAGE Stage of the March VARCHAR(50)
DATE When the match occurred. DATE
PENS Did the match end with penalty VARCHAR(50)
PENS_HOME_SCORE In case of penalty, score by home team VARCHAR(50)
PENS_AWAY_SCORE In case of penalty, score by away team VARCHAR(50)
TEAM_NAME_HOME Team home name VARCHAR(50)
TEAM_NAME_AWAY Team away name VARCHAR(50)
TEAM_HOME_SCORE Team home score NUMBER
TEAM_AWAY_SCORE Team away score NUMBER
POSSESSION_HOME Ball possession for the home team FLOAT
POSSESSION_AWAY Ball possession for the away team FLOAT
TOTAL_SHOTS_HOME Number of shots by the home team NUMBER
TOTAL_SHOTS_AWAY Number of shots by the away team NUMBER
SHOTS_ON_TARGET_HOME Total shot for home team FLOAT
SHOTS_ON_TARGET_AWAY Total shot for away team FLOAT
DUELS_WON_HOME duel win possession of ball - for home team NUMBER
DUELS_WON_AWAY duel win possession of ball - for away team NUMBER
PREDICTION_TEAM_HOME_WIN Probability of home team to win FLOAT
PREDICTION_DRAW Probability of draw FLOAT
PREDICTION_TEAM_AWAY_WIN Probability of away team to win FLOAT
LOCATION Stadium where the match was held VARCHAR(50)

Each table contains match-level statistics, team names, scores, possession metrics, and prediction probabilities for each UEFA match played.

Data Source: This project uses UEFA match data provided in DataCamp's Data Sources repository for DataLab projects hosted on Snowflake.


Core Insights

1. Teams With Most Games of Majority Possession (per Season)

  • Identified which team dominated possession most frequently from 2020–2022.
  • Used CASE, RANK(), and multi-table UNION ALL to compare team dominance across years.
WITH all_matches AS (
  SELECT '2020' AS SEASON, * FROM SOCCER.TBL_UEFA_2020
  UNION ALL
  SELECT '2021' AS SEASON, * FROM SOCCER.TBL_UEFA_2021
  UNION ALL
  SELECT '2022' AS SEASON, * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT SEASON, TEAM_NAME, GAME_COUNT
FROM ranked
WHERE team_rank = 1;

👉 See full query

2. Duel Winners Who Still Lost the Match (2020-2022)

  • Identified teams that won more duels than their opponent, yet still lost the game — a sign of inefficiency or poor finishing.
  • Extended across all three UEFA seasons (2020–2022) for deeper trends.
  • Used a CASE statement with a multi-season UNION ALL and match filtering.
WITH all_matches AS (
    SELECT '2020' AS SEASON, * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT '2021' AS SEASON, * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT '2022' AS SEASON, * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT
    SEASON,
    STAGE,
    TEAM_LOST
FROM duel_lost_games
WHERE TEAM_LOST IS NOT NULL
ORDER BY SEASON, STAGE;

👉 View full query

3. Top 5 Most Aggressive Teams by Stage

  • Identified teams that consistently played aggressively by taking the most total shots during UEFA matches.
  • Combined data across 2020–2022 seasons to ensure a broader view of performance.
  • Used RANK() and PARTITION BY STAGE to select the top 5 teams per stage.
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT STAGE, TEAM, TOTAL_SHOTS
FROM ranked_teams
WHERE shot_rank <= 5;

👉 See full query

4. Most Efficient Teams (Shots on Target per Shot Attempt)

  • Calculated which teams were most efficient in turning shot attempts into shots on target across all UEFA seasons (2020–2022).
  • Combined home and away stats using UNION ALL, then aggregated and ranked based on an efficiency ratio.
  • Used ROUND() and NULLIF() for clean, safe calculation of ratios.
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT
    TEAM,
    TOTAL_TARGETS,
    TOTAL_SHOTS,
    ROUND(TOTAL_TARGETS / NULLIF(TOTAL_SHOTS, 0), 3) AS EFFICIENCY
FROM team_total_shots
ORDER BY EFFICIENCY DESC;

👉 View full query

5. Win Probability Accuracy Tracker (2020-2022)

  • Compared predicted outcomes with actual results.
  • Evaluated prediction model overall reliability.
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT 
  COUNT(*) AS TOTAL_MATCHES,
  SUM(IS_CORRECT) AS CORRECT_PREDICTIONS,
  ROUND(SUM(IS_CORRECT) / COUNT(*), 3) AS ACCURACY
FROM accuracy_check;

👉 View full query

6. Win Probability Accuracy Tracker (By Stage)

  • Evaluated how accurately the prediction model performed across different tournament stages (2020–2022).
  • Assigned the most likely predicted result based on highest probability field.
  • Compared with actual result using final scores.
  • Accuracy is calculated per stage to understand where the model performs better.
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT 
  'ALL' AS STAGE,
  COUNT(*) AS TOTAL_MATCHES,
  SUM(IS_CORRECT) AS CORRECT_PREDICTIONS,
  ROUND(SUM(IS_CORRECT) / COUNT(*), 3) AS ACCURACY
FROM accuracy_check

ORDER BY 
  CASE WHEN STAGE = 'ALL' THEN 1 ELSE 0 END,
  STAGE;

👉 View full query

7. Win Probability Accuracy Tracker (By Team)

  • Evaluated how accurately the prediction model performed across all teams during UEFA tournaments (2020–2022).
  • Assigned the most likely predicted result for each match based on the highest probability field.
  • Compared with actual match results to determine whether the prediction was correct for each team.
  • Accuracy is calculated per team to identify which teams consistently outperformed or underperformed model expectations.
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT 
	TEAM_NAME,
	COUNT(*) AS TOTAL_MATCHES,
	SUM(IS_CORRECT) AS CORRECT_PREDICTIONS,
	ROUND(SUM(IS_CORRECT) / COUNT(*), 3) AS ACCURACY
FROM accuracy_check
GROUP BY TEAM_NAME
ORDER BY ACCURACY DESC;

👉 View full query

8. Possession vs. Result Correlation

  • Analyzed whether having majority possession led to winning across UEFA tournaments (2020–2022).
  • Assigned match outcomes and possession dominance, then compared them.
  • Calculated how often possession-dominant teams won, drew, or lost — broken down by season.
  • Provided a data-driven look at the common belief: "more possession = better results."
WITH all_matches AS (
    SELECT '2020' AS SEASON, * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT '2021' AS SEASON, * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT '2022' AS SEASON, * FROM SOCCER.TBL_UEFA_2022
),
...
SELECT 
	SEASON,
	POSSESSION_VS_RESULT,
	COUNT(*) AS MATCH_COUNT,
	ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY SEASON), 2) AS PERCENTAGE
FROM possession_to_match_winner
GROUP BY SEASON, POSSESSION_VS_RESULT
ORDER BY SEASON;

👉 View full query


Views Created

Note: Views in this project are presented for demonstration purposes only. Due to environment restrictions in DataLab, actual CREATE VIEW execution was not permitted. Logic remains fully reusable in any Snowflake-compatible environment.

VIEW_TEAM_POSSESSION_ANALYSIS

  • Identifies the team with majority possession in each match, across all UEFA seasons (2020-2022).
  • Transforms raw possession data into a match-level insight: “Who dominated possession in this game?”
  • Includes match stage, date, and team names for contextual analysis.
  • Makes other SQL queries simpler and more readable by offloading the logic.
-- View: VIEW_TEAM_POSSESSION_ANALYSIS
-- Description: Identifies the team with majority possession in each match across all seasons.
-- Data Source: TBL_UEFA_2020, TBL_UEFA_2021, TBL_UEFA_2022
-- Dependencies: Requires possession fields and team names.
CREATE VIEW VIEW_TEAM_POSSESSION_ANALYSIS AS
WITH all_matches AS (
    SELECT '2020' AS SEASON, * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT '2021' AS SEASON, * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT '2022' AS SEASON, * FROM SOCCER.TBL_UEFA_2022
)
...

👉 View full query

VIEW_DUEL_LOSSES_BY_STAGE

  • Identifies matches where a team won more duels but still lost the game, across all UEFA seasons (2020–2022).
  • Highlights potential tactical inefficiencies — teams that dominate physical play but fail to convert it into results.
  • Includes match stage, date, and team names for contextual analysis.
  • Useful for stage-wise performance breakdowns and in-depth tactical reviews.
-- View: VIEW_DUEL_LOSSES_BY_STAGE
-- Description: Captures matches where a team won more duels but still lost the match, broken
-- down by stage.
-- Data Source: TBL_UEFA_2020, TBL_UEFA_2021, TBL_UEFA_2022
-- Dependencies: Requires duels fields and team names.
CREATE VIEW VIEW_DUEL_LOSSES_BY_STAGE AS
WITH all_matches AS (
    SELECT '2020' AS SEASON, * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT '2021' AS SEASON, * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT '2022' AS SEASON, * FROM SOCCER.TBL_UEFA_2022
)
...

👉 View full query


Core Insights Visualization

1. Win Probability Accuracy (Per Stage / Per Team)

Why it matters:

  • Helps stakeholders see where the prediction model works best — or fails.
  • Makes performance patterns immediately visible.

Created Visualizations

  • 📊 Bar charts: Accuracy by stage & Accuracy by team.

Insight Summary

  • Stage matters: Predictive accuracy declines in later, more competitive rounds.
  • Team type matters: The model performed best with teams that likely followed more consistent patterns — weaker clubs may have been easier to predict.
  • Opportunity: Future improvements could involve retraining models specifically for knockout rounds or elite matchups.

👉 View code/visuals

2. Possession vs. Result Correlation

Why it matters:

  • Debunks or confirms the myth that "more possession = more wins".
  • Visual impact strengthens the insight (e.g., surprising % of losses with high possession).

Created Visualizations

  • 📊 Pie chart: Win/loss/draw breakdown when a team had possession advantage.
  • 📊 Stacked bar chart: Possession result types by season.

Insight Summary

  • Ball possession matters: Possession dominance improves winning chances but doesn’t guarantee victory.
  • Opportunity: Further team-level analysis of ball possession versus result correlation analysis is needed to answer the question whether possession dominant teams perform better.

👉 View code/visuals

3. Most Efficient Teams (Shots on Target per Shot Attempt)

Why it matters:

  • Highlights which teams are precise and clinical — or wasteful
  • Great for comparing team styles

Created Visualizations:

  • 📊 Horizontal bar chart: Top 15 Most Efficient Teams

Insight Summary

  • Shooting efficiency matters: Indicates a team's ability to convert opportunities into goals, which is crucial for winning matches.
  • Team's offensive strategy: In real-world football, a team's offensive strategy capitalizes on the vulnerabilities of an opposing defense to advance the ball and create scoring opportunities.

👉 View code/visuals


Key Skills Demonstrated

  • Advanced SQL Querying:
    Built complex multi-CTE queries, applied CASE logic, UNION ALL operations, dynamic window functions, and calculated key performance metrics.
  • Data Cleaning and Validation:
    Identified and corrected anomalies (e.g., efficiency > 100%), handled NULL values responsibly, and ensured data integrity throughout the analysis.
  • Analytical Thinking and Insight Generation:
    Moved beyond descriptive analysis to uncover hidden patterns, unexpected results, and real-world tactical implications.
  • Visual Storytelling:
    Designed clear, publication-ready charts (bar charts, pie charts, stacked bar charts) with well-labeled axes, percentage annotations, and contextual commentary.
  • Project Organization and Documentation:
    Structured notebooks and SQL files with reusable templates, detailed markdown commentary, and professional-grade README documentation.
  • Critical Thinking:
    Applied structured frameworks to generate insights systematically (common outcome, unexpected pattern, real-world meaning) - showing not just technical ability but strategic analysis skills.
  • Portfolio-Ready Communication:
    Presented each insight with a clear storyline suitable for GitHub presentation and future professional discussions.

📬 Contact

If you have questions about this project or want to collaborate, feel free to reach out!

About

Tactical and performance insights from UEFA Champions League data (2020–2022) using Snowflake SQL and Python visualizations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published