The following repository contains simple sets of code to clean, restructure, and assess data downloaded from the Central Pollution Control Board Data
You can find the video of results generated by the code here: https://www.linkedin.com/posts/nirwan2410_happylearning-datacleaning-pythonscripting-activity-7142179409670701057-9fOZ?utm_source=share&utm_medium=member_desktop
There are two primary conditions for this code to work:
- Six Variables' Necessity The code still needs a consistent data structure to pick the right things from the right place. One may download data for multiple stations from official CPCB CAAQMS portal https://app.cpcbccr.com/ccr/#/login https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing Yet, the data must be downloaded for only 6 variables, i.e. pollutant concentration/meteorological parameters. This is what the data must look like. You may refer to the Sample Data folder for Bangalore.
- Required Library Versions These are the versions this code was developed upon. It may work on other versions IDK. pandas 2.0.3 numpy 1.25.2 seaborn 0.12.2 matplotlib 3.7.2
The code generates the following results:
-
Excel sheet for individual pollutants/meteorological parameters for every station for all years. The folders are named based on the pollutant/meteorological parameters.
-
Excel sheet for individual pollutants/meteorological parameters for each year with all stations aligned one after the other. The mean and standard deviation are also included in the results.
-
Daily Data folder that contains the daily pollutants/meteorological parameters for each year. Heat maps are also generated, where the x-axis shows the day of the year and the y-axis shows the year. The color of the cell signifies the pollutant concentration.
-
Weekly Data folder that contains the weekly pollutants/meteorological parameters for each year. Heat maps are also generated, where the x-axis shows the week of the year and the y-axis shows the year. The color of the cell signifies the pollutant concentration.
-
A text file is generated that contains the mean and standard deviation