Delhi Air Quality Index calculation, EDA and Regression models
Dataset Link: https://www.kaggle.com/datasets/deepaksirohiwal/delhi-air-quality
• Developed an algorithm to calculate the Air Quality Index (AQI) for hourly pollutant values and assigned corresponding labels based on established guidelines. • Constructed a predictive model using the Random Forest Regressor algorithm to accurately forecast the AQI, leveraging insights gained from exploratory data analysis (EDA) and comprehensive dataset preprocessing techniques. • Evaluated the performance of the predictive model on previously unseen data, achieving an impressive accuracy rate of 82% and demonstrating its robustness and reliability.
The majority of the data points have AQI's between 0 and 150 which means that the livability condition is between Good and Unhealthy for sensitive groups.
The concentration of CO in the atmosphere is greater during the weekdays of winter and autumn as compared to the weekends in those seasons, because a lot less number of trains and cars are running on weekends to commute people to and from their work.
Summer and Monsoon have similar concentrations of CO during the weekdays and the weekends.
As compared to other seasons, Monsoons are rightly the least polluted ones which makes sense because a lot of pollution matter are washed away from the atmosphere, flowing away the rain into the ground.
The air quality index reaches a peak during the 3:00 PM hour of the day. One of the primary reasons is the increased vehicular traffic till 3:00 PM and from then the Air quality index keeps reducing as there traffic go down.
NO2 and SO2 are not directly related to CO. However, the presence of NO2 and SO2 in the air can indirectly affect the levels of CO in the atmosphere.
There were 2907 number of labels correctly predicted in the test dataset out of 3643 instances, with an error scope of 16.
Accuracy Percentage : 79.79687071095252