This document offers a brief introduction to the functions in this library. Some function might be fairly straightforward to create but some functions may need you to look up the help entry before trying it out.
This library is very much in development. It is a compilation of code I’ve written across different project which is why there may be syntactical inconsistencies, such as some functions using player names and some others using player IDs as a reference for players. Some of this is also code that I have never published the output of so it may need additional arguments. I have, however, tried to give enough documentation for each function so anyone trying to use the library should be well equipped with instructions and a basic understanding of how the functions work.
Any inputs and feedback is welcome. If you’re on github, then head to github.com/thecomeonman/CodaBonit, or else get in touch on Twitter
Install R from https://cran.r-project.org
Open R and run this command in the console -
install.packages("devtools");
library(devtools);
install_github("thecomeonman/CodaBonito");
And you’re ready to run the examples below!
I have added some fake data along with the package to be able to better explain the usage of these functions.
dtPlayerMetrics
- aggregated data for players is typically in a format
similar to this, with some extra details about the team they play for,
their age, etc.
PlayerName | TeamName | playerId | Metric1 | Metric2 | Metric3 | Metric4 | Metric5 | Metric6 | Metric7 |
---|---|---|---|---|---|---|---|---|---|
gjn xfv | jsw | 1 | 2.229299 | 0.5955696 | 1.0000000 | 0.8763470 | 0.7329688 | 3.158645 | 0.0000013 |
yqp bfe | rzu | 2 | 3.097161 | 0.9443782 | 0.0029271 | 0.8706489 | 0.8115634 | 2.880184 | 0.0805346 |
rjs mrx | svk | 3 | 3.132211 | 0.1286577 | 0.0021049 | 0.9959918 | 1.0587961 | 6.371049 | 0.2331633 |
jtw fqd | rdz | 4 | 2.440632 | 0.5247019 | 0.9977317 | 0.4593465 | 1.4070274 | 4.061111 | 0.0000364 |
gja jvi | bhj | 5 | 3.325477 | 0.9318757 | 0.0000363 | 0.9999948 | 2.2463044 | 7.112237 | 0.0271460 |
mol euq | yza | 6 | 2.483550 | 0.4419821 | 1.0000000 | 0.9936560 | 1.3460208 | 2.276407 | 0.0845131 |
dtMetricCategorisation
- some metadata about the metrics. -
variableLabel is the name that will be displayed in charts for that
metric, - variableCategory is the grouping of variables used in some
visualisations, like fNormalisedValueChart
- HighValueIsBad is marked
true for variables where a high value is bad. Variables such as fouls
and goals conceded would be true.
variable | variableLabel | variableCategory | HighValueIsBad | suffix |
---|---|---|---|---|
Metric1 | Metric 1 | Offense | FALSE | |
Metric2 | Metric 2 | Offense | FALSE | |
Metric3 | Metric 3 | Defense | FALSE | % |
Metric4 | Metric 4 | Offense | TRUE | |
Metric5 | Metric 5 | Defense | FALSE | |
Metric7 | Metric 7 | Defense | FALSE | % |
dtPasses
- passing data. - x,y denote the start coordinates of the
pass - endX, endY denote the end coordinates of the pass - passLength is
the length of the pass - passAngle is the angle of the pass in radians (
180 degress = pi radians ) where 0 is along the pitch from defense to
offense. - Success 1 for successful pass, 0 for failed pass
playerId | x | y | endX | endY | passLength | passAngle | Success | recipientPlayerId |
---|---|---|---|---|---|---|---|---|
1 | 8.187907 | 56.49550 | 10.14677 | 65.23493 | 8.956269 | 2.9552227 | 1 | 3 |
1 | 4.806998 | 23.82662 | 32.60867 | 74.70806 | 57.981493 | 1.3998821 | 1 | 3 |
1 | 6.449829 | 28.04108 | 47.98204 | 41.56566 | 43.678820 | 0.5959662 | 1 | 8 |
1 | 8.502368 | 25.26964 | 39.34758 | 50.24656 | 39.689717 | 1.0575427 | 1 | 8 |
1 | 14.719737 | 53.69758 | 64.79699 | 11.99930 | 65.165003 | -1.3106447 | 1 | 2 |
1 | 11.117120 | 49.90731 | 25.42474 | 8.75858 | 43.565195 | -2.1075148 | 1 | 8 |
dtFormation
- Coordinates as per the formation
playerId | x | y |
---|---|---|
1 | 15 | 40 |
2 | 35 | 20 |
3 | 35 | 60 |
8 | 60 | 40 |
9 | 90 | 40 |
dtPlayerLabels
- Player labels
playerId | playerName |
---|---|
1 | asd qwe |
2 | qwe rty |
3 | ghj zxc |
8 | fgh rty |
9 | cvb dfg |
lTrackingData$dtTrackingData
- Tracking data
Tag | Player | Frame | X | Y | Time_s | VelocityX | VelocityY |
---|---|---|---|---|---|---|---|
Away | AwayPlayer1 | 0 | 80.69826 | 55.22010 | 0.0 | 0.000000 | 0.0000000 |
Away | AwayPlayer1 | 1 | 80.35326 | 55.11603 | 0.2 | -1.724991 | -0.5203465 |
Away | AwayPlayer1 | 2 | 80.00826 | 55.01196 | 0.4 | -1.724991 | -0.5203465 |
Away | AwayPlayer1 | 3 | 79.66327 | 54.90789 | 0.6 | -1.724991 | -0.5203465 |
Away | AwayPlayer1 | 4 | 79.31827 | 54.80382 | 0.8 | -1.724991 | -0.5203465 |
Away | AwayPlayer1 | 5 | 78.97327 | 54.69975 | 1.0 | -1.724991 | -0.5203465 |
## Visu | alisations |
fAddPitchLines draws pitch markings with further customisations available
pPitch = ggplot() + geom_pitch()
print(pPitch)
You can add whatever stats you want on top of it like regular ggplot2
# adding passing data on top now
pPitch = pPitch +
geom_point(
data = dtPasses,
aes(x = x , y = y)
)
print(pPitch)
If you aren’t interested in having the axis markings, etc., use theme_pitch
pPitch = pPitch +
theme_pitch()
print(pPitch)
pStripChart = fStripChart (
dtPlayerMetrics,
vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
dtMetricCategorisation,
iPlayerId = 2,
cTitle = 'Sample',
vnExpand = c(-0.3, -0.03, 1.2, 1.3)
)
print(pStripChart)
pBeeswarmChart = fBeeswarmChart (
dtPlayerMetrics,
vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
dtMetricCategorisation,
iPlayerId = 2,
cTitle = 'Sample'
)
print(pBeeswarmChart)
pPercentileBarChart = fPercentileBarChart(
dtDataset = dtPlayerMetrics,
vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
dtMetricCategorisation,
iPlayerId = 2,
cTitle = 'Sample'
)
print(pPercentileBarChart)
Percentiles can be a little misleading if the underlying numbers aren’t uniformly distributed. You can add annotations for an indicator of the absolute spread of the values and where this particular player’s values fall within that spread.
pPercentileBarChart = fPercentileBarChart(
dtDataset = dtPlayerMetrics,
vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
dtMetricCategorisation,
iPlayerId = 2,
cTitle = 'Sample',
# vnQuantileMarkers = c(0.01, 0.25, 0.5, 0.75, 0.99),
bAddAbsoluteIndicator = T
)
print(pPercentileBarChart)
I disapprove of radar charts. It’s a bad visualisation, prone to misinterpretation. They seem to be the accepted norm of comparing players though which is why I had to sell out and have an implementation of that in the package, but I’ve added a warning which states how you should use one of the other visualisations instead as those are better structured visualisation.
pRadarPercentileChart = fRadarPercentileChart (
dtPlayerMetrics = dtPlayerMetrics,
vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
dtMetricCategorisation = dtMetricCategorisation,
iPlayerId = 2,
cTitle = 'Sample'
)
#> Warning in fRadarPercentileChart(dtPlayerMetrics = dtPlayerMetrics,
#> vcColumnsToIndex = c("playerId", : Radar charts can be misleading. Use
#> fPercentileBarChart instead.
print(pRadarPercentileChart)
pPlotSonar = fPlotSonar(
dtPassesToPlot = dtPasses,
iBlocksInFirstRing = 4,
iNbrRings = 8,
nZoomFactor = NULL,
nXLimit = 120,
nYLimit = 80,
bAddPitchBackground = F,
cTitle = NULL
)
print(pPlotSonar)
# Sonar broken up by pitch area
dtPassesByPitchArea = dtPasses[,
list(
playerId,
passLength,
passAngle,
x,
y,
Success,
xBucket = (
ifelse(
x %/% 20 == 120 %/% 20,
( x %/% 20 ) - 1,
x %/% 20
) * 20
) + 10,
yBucket = (
ifelse(
y %/% 20 == 80 %/% 20,
( y %/% 20 ) - 1,
y %/% 20
) * 20
) + 10
)
]
pPlotSonarVariation1 = fPlotSonar(
dtPassesToPlot = dtPassesByPitchArea,
iBlocksInFirstRing = 4,
iNbrRings = 8,
nZoomFactor = NULL,
nXLimit = 120,
nYLimit = 80,
bAddPitchBackground = T,
cTitle = 'Sample by Area of Pitch'
)
print(pPlotSonarVariation1)
# Sonar broken up player, placed at their median passing location
dtPassesByPlayer = merge(
dtPasses,
merge(
dtPasses[,
list(
xBucket = median(x),
yBucket = median(y)
),
list(
playerId
)
],
dtPlayerLabels[,
list(
playerId,
bucketLabel = playerName
)
],
c(
'playerId'
)
),
c(
'playerId'
)
)
pPlotSonarVariation2 = fPlotSonar (
dtPassesToPlot = dtPassesByPlayer,
iBlocksInFirstRing = 4,
iNbrRings = 8,
nYLimit = 80,
nXLimit = 120,
bAddPitchBackground = T,
cTitle = 'Sample By Median Position On Pitch'
)
print(pPlotSonarVariation2)
# Sonar broken up player, placed at the location dictated by their role
# in the formations
dtPassesByPlayerFormation = merge(
dtPasses,
merge(
dtFormation[,
list(
xBucket = x,
yBucket = y,
playerId
)
],
dtPlayerLabels[,
list(
playerId,
bucketLabel = playerName
)
],
c(
'playerId'
)
),
'playerId'
)
pPlotSonarVariation3 = fPlotSonar(
dtPassesToPlot = dtPassesByPlayerFormation,
iBlocksInFirstRing = 4,
iNbrRings = 8,
nXLimit = 120,
nYLimit = 80,
bAddPitchBackground = T,
cTitle = 'Sample By Formation'
)
print(pPlotSonarVariation3)
pPassNetworkChart = fPassNetworkChart(
dtPasses,
dtPlayerLabels
)
print(pPassNetworkChart)
pXgBuildUpComparison = fXgBuildUpComparison(
dtXg,
dtTeamLabels
)
print(pXgBuildUpComparison)
WIP using the same data structure as https://github.com/metrica-sports/sample-data which you can parse with fParseTrackingDataBothTeams
You can draw Voronois
pVoronoi = fDrawVoronoiFromTable(
lTrackingData$dtTrackingData[Frame == min(Frame)],
nXLimit = 120,
nYLimit = 80
)
print(pVoronoi)
And if you have multiple frames -
voronoiOutput = fDrawVoronoiFromTable(
lTrackingData$dtTrackingData,
nXLimit = nXLimit,
nYLimit = nYLimit,
UseOneFrameEvery = 1,
DelayBetweenFrames = 5
)
if ( !interactive() ) {
qwe = suppressWarnings(
file.remove('./README_files/figure-markdown_strict/Voronoi.gif')
)
rm(qwe)
qwe = file.copy(
voronoiOutput,
'./README_files/figure-markdown_strict/Voronoi.gif'
)
rm(qwe)
}
The Friends of Tracking pitch control model -
lPitchControl = fGetPitchControlProbabilities (
lData = lTrackingData,
viTrackingFrame = lTrackingData$dtTrackingData[, unique(Frame)[5]],
nYLimit = nYLimit,
nXLimit = nXLimit,
iGridCellsX = nXLimit / 3
)
pPlotPitchControl = fPlotPitchControl(
lPitchControl
)
print(pPlotPitchControl)
Some of my other experiments with tracking data are here - https://github.com/thecomeonman/MakingFriendsWithTrackingData ## Logic and Algorithms
A function to calculate earth mover’s distance. It offers more flexibility and transparency than emdist:emd.
Any distance matrix can be used to calculated EMD, but emdist:emd insists on getting the raw distributions with only up to four dimensions. fEMDDetailed only requires a distance matrix between each combination of observations in the two datasets, irrespective of the nature of the data.
# Two random datasets of three dimension
a = data.table(matrix(runif(21), ncol = 3))
b = data.table(matrix(runif(30), ncol = 3))
# adding serial numbers to each observation
a[, SNO := .I]
b[, SNO := .I]
# evaluating distance between all combinations of data in the two datasets
a[, k := 'k']
b[, k := 'k']
dtDistances = merge(a,b,'k',allow.cartesian = T)
dtDistances[,
Distance := (
(( V1.x - V1.y) ^ 2) +
(( V2.x - V2.y) ^ 2) +
(( V3.x - V3.y) ^ 2)
) ^ 0.5
]
# getting EMD between this dataet
lprec = fEMDDetailed(
SNO1 = dtDistances[, SNO.x],
SNO2 = dtDistances[, SNO.y],
Distance = dtDistances[, Distance]
)
print(fGetEMDFromDetailedEMD(lprec))
#> [1] 0.4185668
# This value should be the same as that computed by emdist package's emd function.
# EMD needs the weightage of each point, which is assigned as equal in our
# function, so giving 1/N weightage to each data point
# emdist::emd(
# as.matrix(
# a[, list(1/.N, V1,V2,V3)]
# ),
# as.matrix(
# b[, list(1/.N, V1,V2,V3)]
# )
# ))
On the topic of transparency, one of the things I find very useful is that you can now see how much distance is being contributed by each observation.
dtDistances[, EMDWeightage := get.variables(lprec)]
ggplot(dtDistances) +
geom_point(
data = dtDistances,
aes(
x = factor(SNO.x),
y = factor(SNO.y),
size = Distance,
color = EMDWeightage
)
) +
scale_colour_continuous(
low = 'black',
high = 'red'
) +
coord_fixed() +
xlab('SNO.x') +
ylab('SNO.y')
You will find fJsonToListOfTables
, fJsonToTabular
,
fParseTrackingData
useful. They aren’t glamarous enough to be demoed
here but the documentation should help you use those functions.