The game of football, otherwise known as soccer, is a game played throughout the world. The English Premier League is one of the best and most competitive club football leagues in the world. It attracts players and fans from all over the world. The dataset I have selected is complete data collected from teams in the Premier League from the 2000-2001 season to the 2017-2018 season. This data has 6,840 games recorded over the total span of the data collected. The Premier League also has a relegation and promotion system, which means there are rows that have data omitted for certain years as new teams are relegated and promoted. This provides an added challenge when working with this dataset.
The dataset contains the results of each fixture (game) played, but not the league table that results from the games or the total sum of the features. Consquently, I intend to answer the following questions:
- Is there a home field advantage?
- How does the average number of goals scored at home per season compared to the average number of goals scored away?
- Is there a difference in the number of home wins and away wins in each season?
- Can the league table for each season be created based on the results in the raw dataset?
- Which team accumulated the most points over the seasons?
- Does the cutoff for being in the top four and bottom four change each year?