IPL is the biggest sports festival our India. It consist of all Great international crickets from different countries and domestic players of India. We have three csv dataset files which has the data of IPL matches per ball summary, venue details and total Match summary. This dataset consists of three separate CSV files : matches and deliveries. These files contain the information of each match summary and ball by ball details, respectively.
Our Aim in this project is to analyse the dataset using sql queries. The tools and technology used were:
- Python
- Pyspark library
- SQl
- Databricks Notebook (PLatform to write and run our code)
We can get to know more about the dataset by applysin as many queries as we want but for our study purpose it is limited to 9 queries only.
Steps:-
- Load Dataset
- Filter and Data cleaning
- Join Dataset
- Convert dataset into Table
- Perform SQl queries using SqlContext
- Using SQlite Database to create a database for our table
- Creating class Database which has methods to implemewnt our queries using SQlite DAtabase
- Creating objects for our class Database and calling methods to implement ou sql queries thorough it