Skip to content

thecomeonman/CodaBonito

Repository files navigation

This document offers a brief introduction to the functions in this library. Some function might be fairly straightforward to create but some functions may need you to look up the help entry before trying it out.

Disclaimer / Call for Inputs

This library is very much in development. It is a compilation of code I’ve written across different project which is why there may be syntactical inconsistencies, such as some functions using player names and some others using player IDs as a reference for players. Some of this is also code that I have never published the output of so it may need additional arguments. I have, however, tried to give enough documentation for each function so anyone trying to use the library should be well equipped with instructions and a basic understanding of how the functions work.

Any inputs and feedback is welcome. If you’re on github, then head to github.com/thecomeonman/CodaBonit, or else get in touch on Twitter

How to get started

Install R from https://cran.r-project.org

Open R and run this command in the console - install.packages("devtools"); library(devtools); install_github("thecomeonman/CodaBonito");

And you’re ready to run the examples below!

Data

I have added some fake data along with the package to be able to better explain the usage of these functions.

dtPlayerMetrics - aggregated data for players is typically in a format similar to this, with some extra details about the team they play for, their age, etc.

PlayerName TeamName playerId Metric1 Metric2 Metric3 Metric4 Metric5 Metric6 Metric7
gjn xfv jsw 1 2.229299 0.5955696 1.0000000 0.8763470 0.7329688 3.158645 0.0000013
yqp bfe rzu 2 3.097161 0.9443782 0.0029271 0.8706489 0.8115634 2.880184 0.0805346
rjs mrx svk 3 3.132211 0.1286577 0.0021049 0.9959918 1.0587961 6.371049 0.2331633
jtw fqd rdz 4 2.440632 0.5247019 0.9977317 0.4593465 1.4070274 4.061111 0.0000364
gja jvi bhj 5 3.325477 0.9318757 0.0000363 0.9999948 2.2463044 7.112237 0.0271460
mol euq yza 6 2.483550 0.4419821 1.0000000 0.9936560 1.3460208 2.276407 0.0845131

dtMetricCategorisation - some metadata about the metrics. - variableLabel is the name that will be displayed in charts for that metric, - variableCategory is the grouping of variables used in some visualisations, like fNormalisedValueChart - HighValueIsBad is marked true for variables where a high value is bad. Variables such as fouls and goals conceded would be true.

variable variableLabel variableCategory HighValueIsBad suffix
Metric1 Metric 1 Offense FALSE
Metric2 Metric 2 Offense FALSE
Metric3 Metric 3 Defense FALSE %
Metric4 Metric 4 Offense TRUE
Metric5 Metric 5 Defense FALSE
Metric7 Metric 7 Defense FALSE %

dtPasses - passing data. - x,y denote the start coordinates of the pass - endX, endY denote the end coordinates of the pass - passLength is the length of the pass - passAngle is the angle of the pass in radians ( 180 degress = pi radians ) where 0 is along the pitch from defense to offense. - Success 1 for successful pass, 0 for failed pass

playerId x y endX endY passLength passAngle Success recipientPlayerId
1 8.187907 56.49550 10.14677 65.23493 8.956269 2.9552227 1 3
1 4.806998 23.82662 32.60867 74.70806 57.981493 1.3998821 1 3
1 6.449829 28.04108 47.98204 41.56566 43.678820 0.5959662 1 8
1 8.502368 25.26964 39.34758 50.24656 39.689717 1.0575427 1 8
1 14.719737 53.69758 64.79699 11.99930 65.165003 -1.3106447 1 2
1 11.117120 49.90731 25.42474 8.75858 43.565195 -2.1075148 1 8

dtFormation - Coordinates as per the formation

playerId x y
1 15 40
2 35 20
3 35 60
8 60 40
9 90 40

dtPlayerLabels - Player labels

playerId playerName
1 asd qwe
2 qwe rty
3 ghj zxc
8 fgh rty
9 cvb dfg

lTrackingData$dtTrackingData - Tracking data

Tag Player Frame X Y Time_s VelocityX VelocityY
Away AwayPlayer1 0 80.69826 55.22010 0.0 0.000000 0.0000000
Away AwayPlayer1 1 80.35326 55.11603 0.2 -1.724991 -0.5203465
Away AwayPlayer1 2 80.00826 55.01196 0.4 -1.724991 -0.5203465
Away AwayPlayer1 3 79.66327 54.90789 0.6 -1.724991 -0.5203465
Away AwayPlayer1 4 79.31827 54.80382 0.8 -1.724991 -0.5203465
Away AwayPlayer1 5 78.97327 54.69975 1.0 -1.724991 -0.5203465
## Visu alisations

geom_pitch

fAddPitchLines draws pitch markings with further customisations available

pPitch = ggplot() + geom_pitch()

print(pPitch)

You can add whatever stats you want on top of it like regular ggplot2

# adding passing data on top now
pPitch = pPitch +
   geom_point(
      data = dtPasses,
      aes(x = x , y = y)
   )

print(pPitch)

theme_pitch

If you aren’t interested in having the axis markings, etc., use theme_pitch

pPitch = pPitch +
   theme_pitch()

print(pPitch)

fStripChart

pStripChart = fStripChart (
   dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample',
   vnExpand = c(-0.3, -0.03, 1.2, 1.3)
)

print(pStripChart)

### fBeeswarmChart

pBeeswarmChart = fBeeswarmChart (
   dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)

print(pBeeswarmChart)

fPercentileBarChart

pPercentileBarChart = fPercentileBarChart(
   dtDataset = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)
print(pPercentileBarChart)

fPercentileBarChart with AbsoluteIndicator

Percentiles can be a little misleading if the underlying numbers aren’t uniformly distributed. You can add annotations for an indicator of the absolute spread of the values and where this particular player’s values fall within that spread.

pPercentileBarChart = fPercentileBarChart(
   dtDataset = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample',
   # vnQuantileMarkers = c(0.01, 0.25, 0.5, 0.75, 0.99),
   bAddAbsoluteIndicator = T
)

print(pPercentileBarChart)

fRadarPercentileChart

I disapprove of radar charts. It’s a bad visualisation, prone to misinterpretation. They seem to be the accepted norm of comparing players though which is why I had to sell out and have an implementation of that in the package, but I’ve added a warning which states how you should use one of the other visualisations instead as those are better structured visualisation.

pRadarPercentileChart = fRadarPercentileChart (
   dtPlayerMetrics = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation = dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)
#> Warning in fRadarPercentileChart(dtPlayerMetrics = dtPlayerMetrics,
#> vcColumnsToIndex = c("playerId", : Radar charts can be misleading. Use
#> fPercentileBarChart instead.
print(pRadarPercentileChart)

fPlotSonar

pPlotSonar = fPlotSonar(
   dtPassesToPlot = dtPasses,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nZoomFactor = NULL,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = F,
   cTitle = NULL
)
print(pPlotSonar)

# Sonar broken up by pitch area
dtPassesByPitchArea = dtPasses[,
   list(
      playerId,
      passLength,
      passAngle,
      x,
      y,
      Success,
      xBucket = (
         ifelse(
            x %/% 20 == 120 %/% 20,
            ( x %/% 20 ) - 1,
            x %/% 20
         ) * 20
      ) + 10,
      yBucket = (
         ifelse(
            y %/% 20 == 80 %/% 20,
            ( y %/% 20 ) - 1,
            y %/% 20
         ) * 20
      ) + 10
   )
]

pPlotSonarVariation1 = fPlotSonar(
   dtPassesToPlot = dtPassesByPitchArea,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nZoomFactor = NULL,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = T,
   cTitle = 'Sample by Area of Pitch'
)
print(pPlotSonarVariation1)

# Sonar broken up player, placed at their median passing location
dtPassesByPlayer = merge(
   dtPasses,
   merge(
      dtPasses[,
         list(
            xBucket = median(x),
            yBucket = median(y)
         ),
         list(
            playerId
         )
      ],
      dtPlayerLabels[,
         list(
            playerId,
            bucketLabel = playerName
         )
      ],
      c(
         'playerId'
      )
   ),
   c(
      'playerId'
   )
)

pPlotSonarVariation2 = fPlotSonar (
   dtPassesToPlot = dtPassesByPlayer,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nYLimit = 80,
   nXLimit = 120,
   bAddPitchBackground = T,
   cTitle = 'Sample By Median Position On Pitch'
)
print(pPlotSonarVariation2)

# Sonar broken up player, placed at the location dictated by their role
# in the formations

dtPassesByPlayerFormation = merge(
   dtPasses,
   merge(
      dtFormation[,
         list(
            xBucket = x,
            yBucket = y,
            playerId
         )
      ],
      dtPlayerLabels[,
         list(
            playerId,
            bucketLabel = playerName
         )
      ],
      c(
         'playerId'
      )
   ),
   'playerId'
)
pPlotSonarVariation3 = fPlotSonar(
   dtPassesToPlot = dtPassesByPlayerFormation,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = T,
   cTitle = 'Sample By Formation'
)
print(pPlotSonarVariation3)

fPassNetworkChart

pPassNetworkChart = fPassNetworkChart(
   dtPasses,
   dtPlayerLabels
)
print(pPassNetworkChart)

fXgBuildUpComparison

pXgBuildUpComparison = fXgBuildUpComparison(
   dtXg,
   dtTeamLabels
)
print(pXgBuildUpComparison)

fDrawVoronoi

WIP using the same data structure as https://github.com/metrica-sports/sample-data which you can parse with fParseTrackingDataBothTeams

You can draw Voronois

pVoronoi = fDrawVoronoiFromTable(
   lTrackingData$dtTrackingData[Frame == min(Frame)],
   nXLimit = 120,
   nYLimit = 80
)

print(pVoronoi)

And if you have multiple frames -

voronoiOutput = fDrawVoronoiFromTable(
   lTrackingData$dtTrackingData,
   nXLimit = nXLimit,
   nYLimit = nYLimit,
   UseOneFrameEvery = 1,
   DelayBetweenFrames = 5
)


if ( !interactive() ) {

   qwe = suppressWarnings(
      file.remove('./README_files/figure-markdown_strict/Voronoi.gif')
   )
   rm(qwe)

   qwe = file.copy(
      voronoiOutput,
      './README_files/figure-markdown_strict/Voronoi.gif'
   )

   rm(qwe)

}

The Friends of Tracking pitch control model -

lPitchControl = fGetPitchControlProbabilities (
    lData = lTrackingData,
    viTrackingFrame = lTrackingData$dtTrackingData[, unique(Frame)[5]],
    nYLimit = nYLimit,
    nXLimit = nXLimit,
    iGridCellsX = nXLimit / 3
)
    
pPlotPitchControl = fPlotPitchControl(
    lPitchControl
)

print(pPlotPitchControl)

Some of my other experiments with tracking data are here - https://github.com/thecomeonman/MakingFriendsWithTrackingData ## Logic and Algorithms

fEMDDetailed

A function to calculate earth mover’s distance. It offers more flexibility and transparency than emdist:emd.

Any distance matrix can be used to calculated EMD, but emdist:emd insists on getting the raw distributions with only up to four dimensions. fEMDDetailed only requires a distance matrix between each combination of observations in the two datasets, irrespective of the nature of the data.

# Two random datasets of three dimension
a = data.table(matrix(runif(21), ncol = 3))
b = data.table(matrix(runif(30), ncol = 3))

# adding serial numbers to each observation
a[, SNO := .I]
b[, SNO := .I]

# evaluating distance between all combinations of data in the two datasets
a[, k := 'k']
b[, k := 'k']
dtDistances = merge(a,b,'k',allow.cartesian = T)
dtDistances[,
   Distance := (
      (( V1.x - V1.y) ^ 2) +
      (( V2.x - V2.y) ^ 2) +
      (( V3.x - V3.y) ^ 2)
   ) ^ 0.5
]

# getting EMD between this dataet
lprec = fEMDDetailed(
   SNO1 = dtDistances[, SNO.x],
   SNO2 = dtDistances[, SNO.y],
   Distance = dtDistances[, Distance]
)

print(fGetEMDFromDetailedEMD(lprec))
#> [1] 0.4185668

# This value should be the same as that computed by emdist package's emd function.
# EMD needs the weightage of each point, which is assigned as equal in our
# function, so giving 1/N weightage to each data point
# emdist::emd(
#    as.matrix(
#       a[, list(1/.N, V1,V2,V3)]
#    ),
#    as.matrix(
#       b[, list(1/.N, V1,V2,V3)]
#    )
# ))

On the topic of transparency, one of the things I find very useful is that you can now see how much distance is being contributed by each observation.

dtDistances[, EMDWeightage := get.variables(lprec)]
ggplot(dtDistances) +
   geom_point(
      data = dtDistances,
      aes(
         x = factor(SNO.x),
         y = factor(SNO.y),
         size = Distance,
         color = EMDWeightage
      )
   ) +
   scale_colour_continuous(
      low = 'black',
      high = 'red'
   ) +
   coord_fixed() +
   xlab('SNO.x') +
   ylab('SNO.y')

Data Parsing

You will find fJsonToListOfTables, fJsonToTabular, fParseTrackingData useful. They aren’t glamarous enough to be demoed here but the documentation should help you use those functions.

About

Functions to aid football / soccer analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages