Predicting Air Quality Index using Cutting Edge Deep Learning Algorithms for State-of-art results
- Tools :
- Python3 and Colab Notebook for language support
- Libraries :
- Pandas (1.5.3) — For handling structured data
- NumPy (1.23.5) — For linear algebra and mathematics
- Scikit Learn (1.2.2) — For machine learning
In this, We have used the Air Quality dataset. This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China. The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain.
- Replace NA values
- Parse date-time into pandas dataframe index
- Specified clear names for each columns
- Used matplotlib
- Feature Scaling using Min-Max Scaling
- Correlation matrix
- Normalized data
- Transformed dataset into supervised learning problem
- Define 2 layer GRU and 2 layer LSTM architecture
- Defined 64 neuron first GRU layer followed by 32 in GRU, 32 in LSTM and last 16 nueron in LSTM
- Add dropout at 20% after every layer
- Added an Attention Layer to focus on datapoints which will then maximise the accuracy
- Split data into train and test
- Fitting is done on epochs = 40 and batch_size = 32
- Make prediction
- Plot the line graph between actual vs predicted values using plotly (5.15.0)
- Calculate RMSE, MAE, R2 and MSE values
The chart clearly demonstrates our model's superiority in predicting air pollution levels compared to previous methods. With lower RMSE, MSE, and MAE values, as well as a higher R-squared value, our model excels in accuracy and performance, setting a new standard for air pollution prediction.
The depicted graph below exhibits a remarkable convergence of the actual (in red) and predicted (in green) pollution levels. Notably, both lines closely align, underscoring the exceptional performance achieved by our state-of-the-art 'Attention mechanism with GRU+LSTM' hybrid deep learning model.