Skip to content

Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".

License

Notifications You must be signed in to change notification settings

raphaelsenn/playervectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

playervectors

Implementation of Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams in python.

Install

pip install playervectors

Usage

Expected Format for df_events (SPADL Format)

The df_events DataFrame used in PlayerVectors.fit() must follow the SPADL format, with the following required column names:

Column Name Description
player_id Unique identifier for the player.
action_type Type of action or event (e.g., shot, pass, cross, dribble).
x_start X-coordinate where the action starts.
y_start Y-coordinate where the action starts.
x_end X-coordinate where the action ends.
y_end Y-coordinate where the action ends.

If not, change the mapping in PlayerVectors.fit(column_names=new_column_names) such that:

new_column_names = {
    'player_id': 'your_player_id',
    'action_type': 'your_action_type',
    'x_start': 'your_x_start',
    'y_start': 'your_y_start',
    'x_end': 'your_x_end',
    'y_end': 'your_y_end'
}

Fitting PlayerVectors

Building 18-component PlayerVectors with selected actions shot, cross, dribble and pass with respective components 4, 4, 5 and 5.

from playervectors import PlayerVectors


pvs = PlayerVectors(
    grid=(50, 50),
    actions=['shot', 'cross', 'dribble', 'pass'],
    components=[4, 4, 5, 5]
)

pvs.fit(
    df_events=df_events,
    minutes_played=minutes_played,
    player_names=player_names
)
Parameter Description
df_events Event Stream Data in SPADL-Format.
minutes_played A dictionary that maps each player_id to the total minutes they played across all events in df_events
player_names Mapping player_id to player_name.

Plotting Principle Components

import matplotlib.pyplot as plt


pvs.plot_principle_components()
plt.show()

image

Output of: pvs.plot_principle_components()

Plotting Weight Distribution

import matplotlib.pyplot as plt


pvs.plot_distribution()
plt.show()

image

Output of: pvs.plot_distribution()

Plotting Weights of a Player

import matplotlib.pyplot as plt


# wy_id of Kevin De Bruyne (Central midfielder)
pvs.plot_weights(player_id=38021)
plt.show()

image

Output of: pvs.plot_weights(player_id=38021)

Building Player Vectors

1. Selecting Relevant Action Types

Let $k_t$ be the number of principal components chosen to compress heatmaps of action type $t$.

According to the paper, $k_t$ with $t \in$ {shot, cross, dribble, pass} with corresponding components {4, 4, 5, 5} is the minimal number of components needed to explain 70% of the variance in the heatmaps of action type $t$.

This parameter setting was empirically found to work well because of the high variability of players positions in their actions (see Challenge 1 in Section 2 in the paper).

Ignoring 30% of the variance allows to summarize a player’s playstyle only by his dominant regions on the field rather than model every position on the field he ever occupied.

2. Constructing Heatmaps

2.1 Counting

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

2.2 Normalizing

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

3.3 Smoothing

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

3. Compressing Heatmaps to Vectors

3.1 Reshaping

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

3.2 Construct the matrix M

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

3.3 Compress matrix M by applying non-negative matrix factorization (NMF)

image

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD

4. Assembling Player Vectors

The player vector v of a player p is the concatenation of his compressed vectors for the relevant action types.

Detailed Algorithm

image

Running demo.ipynb

1. Download this Dataset on Kaggle

2. Create a folder named data in this Repository

mkdir data

3. Copy all .csv files from the Dataset in the folder data

4. Run notebook demo.ipynb

About the Datasets

This dataset contains European football team stats. Only teams of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.

All the credit is to Luca Pappalardo and Emmanuele Massucco.

https://www.kaggle.com/datasets/aleespinosa/soccer-match-event-dataset

Citations

@article{ecmlpkdd2019,
  title     = {Player Vectors: Characterizing Soccer Players’
Playing Style from Match Event Streams},
  author    = {Tom Decroos, Jesse Davis},
  journal   = {ecmlpkdd2019},
  year      = {2019},
}

About

Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages