This project analyzes the history of ICC Men's Cricket World Cup matches from 1975 to 2023, focusing on data preparation, visualization, and sentiment analysis.
The project is divided into four tasks:
- GitHub Repository Management: Maintain a well-structured GitHub repository with meaningful commits and branches.
- Data Preparation: Process and clean World Cup data from CSV files.
- Sentiment Analysis: Use a Hugging Face model to classify the sentiment of 2023 World Cup final match commentaries.
- Interactive Dashboard: Create an insightful dashboard using Plotly Dash to showcase key findings.
- The data consists of detailed statistics from all ICC Men's Cricket World Cup matches (1975–2023).
- Key attributes include match dates, venues, scores, notable players, and commentary excerpts.
- Proper version control practices with meaningful commits and pull requests.
- Combine 13 datasets into a unified DataFrame.
- Remove duplicates and handle missing values.
- Add derived columns like
match_statusandwinning_team. - Split nested lists into separate columns for detailed analysis.
- Analyze sentiment in the 2023 World Cup commentary using a Hugging Face model.
- Add a
sentimentcolumn to the dataset. - Visualize sentiment distribution.
- Develop an interactive Plotly Dash dashboard with multiple chart types.
- Highlight trends and insights from the data.
Run Jupyter notebooks for data preparation and sentiment analysis. Start the Plotly Dash dashboard: python dashboard.py