Implementation of Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams in python.
pip install playervectorsThe df_events DataFrame used in PlayerVectors.fit() must follow the SPADL format, with the following required column names:
| Column Name | Description |
|---|---|
| player_id | Unique identifier for the player. |
| action_type | Type of action or event (e.g., shot, pass, cross, dribble). |
| x_start | X-coordinate where the action starts. |
| y_start | Y-coordinate where the action starts. |
| x_end | X-coordinate where the action ends. |
| y_end | Y-coordinate where the action ends. |
If not, change the mapping in PlayerVectors.fit(column_names=new_column_names) such that:
new_column_names = {
'player_id': 'your_player_id',
'action_type': 'your_action_type',
'x_start': 'your_x_start',
'y_start': 'your_y_start',
'x_end': 'your_x_end',
'y_end': 'your_y_end'
}Building 18-component PlayerVectors with selected actions shot, cross, dribble and pass with respective components 4, 4, 5 and 5.
from playervectors import PlayerVectors
pvs = PlayerVectors(
grid=(50, 50),
actions=['shot', 'cross', 'dribble', 'pass'],
components=[4, 4, 5, 5]
)
pvs.fit(
df_events=df_events,
minutes_played=minutes_played,
player_names=player_names
)| Parameter | Description |
|---|---|
| df_events | Event Stream Data in SPADL-Format. |
| minutes_played | A dictionary that maps each player_id to the total minutes they played across all events in df_events |
| player_names | Mapping player_id to player_name. |
import matplotlib.pyplot as plt
pvs.plot_principle_components()
plt.show()Output of: pvs.plot_principle_components()
import matplotlib.pyplot as plt
pvs.plot_distribution()
plt.show()Output of: pvs.plot_distribution()
import matplotlib.pyplot as plt
# wy_id of Kevin De Bruyne (Central midfielder)
pvs.plot_weights(player_id=38021)
plt.show()Output of: pvs.plot_weights(player_id=38021)
Let
According to the paper,
This parameter setting was empirically found to work well because of the high variability of players positions in their actions (see Challenge 1 in Section 2 in the paper).
Ignoring 30% of the variance allows to summarize a player’s playstyle only by his dominant regions on the field rather than model every position on the field he ever occupied.
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
The player vector v of a player p is the concatenation of his compressed vectors for the relevant action types.
1. Download this Dataset on Kaggle
mkdir dataThis dataset contains European football team stats. Only teams of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.
All the credit is to Luca Pappalardo and Emmanuele Massucco.
https://www.kaggle.com/datasets/aleespinosa/soccer-match-event-dataset
@article{ecmlpkdd2019,
title = {Player Vectors: Characterizing Soccer Players’
Playing Style from Match Event Streams},
author = {Tom Decroos, Jesse Davis},
journal = {ecmlpkdd2019},
year = {2019},
}








