This project provides a comprehensive pipeline to forecast the total number of Electric Vehicles (EVs) in each county over time using historical data. The workflow includes data preprocessing, feature engineering, model training with hyperparameter tuning, and model evaluation.
🔗 Live Demo: Streamlit App – EV Vehicle Prediction
Using the Electric_Vehicle_Population_By_County.csv
dataset, this project:
- Cleans and processes time-series EV data at the county level.
- Engineers features to capture temporal trends and growth patterns.
- Trains a
RandomForestRegressor
to predict EV totals. - Evaluates model performance and visualizes feature importance.
- Forecasts EV adoption for any given county over the next 3 years.
- Saves and tests the trained model with
joblib
. - Deploys an interactive forecasting tool via Streamlit (
app.py
).
You can interact with the model using the built-in Streamlit dashboard.
streamlit run app.py
- Handled missing values in
County
andState
- Converted vehicle count columns to numeric
- Capped outliers in
Percent Electric Vehicles
- Converted
Date
column to datetime
- Identified top/bottom counties by EV adoption
- Visualized stacked vehicle distributions
- Calculated total counts for BEVs, PHEVs, EVs, and Non-EVs
- Lag features (1 to 3 months)
- Rolling 3-month EV average
- Percent change over 1 and 3 months
- Cumulative EVs per county
- 6-month rolling slope for growth trend
-
Model:
RandomForestRegressor
-
Hyperparameter Tuning:
RandomizedSearchCV
(30 iterations, 3-fold CV) -
Best Parameters:
{ 'n_estimators': 200, 'min_samples_split': 4, 'min_samples_leaf': 1, 'max_features': None, 'max_depth': 15 }
The dataset should include columns such as:
Date
County
,State
Electric Vehicle (EV) Total
Battery Electric Vehicles (BEVs)
,Plug-In Hybrid Electric Vehicles (PHEVs)
Non-Electric Vehicle Total
,Total Vehicles
Percent Electric Vehicles
- Model:
RandomForestRegressor
- Tuning:
RandomizedSearchCV
with cross-validation
Metric | Value |
---|---|
MAE | 132.76 |
RMSE | 200.45 |
R² Score | 0.89 |
These results indicate a strong model performance with relatively low error compared to the scale of EV counts.
Date County Predicted_EV_Total
0 2025-08-01 Kings 14527
1 2025-09-01 Kings 14862
2 2025-10-01 Kings 15230
3 2025-11-01 Kings 15575
4 2025-12-01 Kings 15940
5 2026-01-01 Kings 16294
- Model saved to:
forecasting_ev_model.pkl
. - Successfully reloaded and tested.
To avoid retraining:
from joblib import load
model = load('forecasting_ev_model.pkl')
Actual EVs: 1025.00
Predicted EVs: 998.23
Forecasts next 36 months of EV growth for a selected county (e.g., Kings).
Includes:
- Monthly predicted EV counts
- Cumulative EV count trendline
- Comparison between historical and forecasted values
- Forecasted next 3 years for the top 5 counties (based on cumulative EV adoption)
- Combined historical and future trendlines
- Visual comparison of growth rates across counties
The Streamlit app supports:
- Selecting up to 3 counties
- Side-by-side EV growth comparison
- Growth % summaries

Stacked column chart comparing:
- BEV (Battery Electric Vehicles)
- PHEV (Plug-in Hybrids)
- EV (total)
- Non-EVs
It highlights the share of EVs in the overall vehicle population.

- Line plot showing the RandomForest model's predictions vs actual EV counts across sample indices.
- Close overlap indicates strong model accuracy.

Bar plot displaying the importance scores of engineered features like:
- Lag values
- Rolling averages
- Percent changes
Used to assess the model's key drivers of prediction.

Historical vs 36-month forecast for Kings County showing monthly EV growth trends.

Chart showing cumulative EV adoption over time, including projected growth for the next 3 years.

Visualization of historical and projected cumulative EV growth for the top 5 counties:
- Fairfax
- Honolulu
- Los Angeles
- Orange
- Santa Clara
The interactive dashboard provides actionable insights into EV adoption trends through dynamic visualizations and comparative analysis. Below are key components demonstrated through the app's outputs:

Features:
- County Selection: Analyze specific counties (e.g., Ada) with adjustable forecast horizons (12–60 months).
Model Metrics:
- MAE: 0.1
- RMSE: 0.3
- MAPE: 4.8%
Advanced Options:
- Seasonality analysis
- Monthly breakdowns
- Historical vs. forecasted comparisons
Example Insights:
- Ada County shows a projected increase from 1.5 to 2.0 EVs/month (31.1% growth rate).

Tracks granular monthly EV counts (e.g., 1.2 to 2.0 EVs/month in Ada).

Visualizes long-term EV accumulation (e.g., ~150 EVs by 2027 in Ada).

Tabular preview of forecasted values (e.g., consistent 2 EVs/month for Ada in 2026–2027).

Features:
- Compare up to 3 counties (e.g., Ada vs. Alameda).
- Metrics: Cumulative counts, monthly adoption rates, or growth percentages.

Side-by-side historical and forecasted trends.
Highlighted Metrics:
- Autauga: 104.7% growth (1.9 → 3.9 EVs/month)
- Alameda: 7.3% growth despite a -52.3% cumulative decline

Bar charts comparing county-level growth percentages.

County | Historical EVs | Forecasted EVs | Growth Rate |
---|---|---|---|
Ada | 90 | 72 | 31.1% |
Alameda | 302 | 144 | 7.3% |
File | Description |
---|---|
Electric_Vehicle_Population_By_County.csv |
Raw EV dataset |
preprocessed_ev_data.csv |
Cleaned and feature-engineered data |
forecasting_ev_model.pkl |
Trained RandomForest regression model |
ev_forecasting.ipynb |
Full pipeline notebook with forecasting |
README.md |
Project overview and instructions |
-
Clone the Repository
git clone https://github.com/your-username/ev-forecasting.git cd ev-forecasting
-
Install Dependencies
Make sure Python ≥ 3.7 is installed, then install required packages:
pip install pandas numpy matplotlib seaborn scikit-learn joblib
-
Run the Script
Ensure the dataset Electric_Vehicle_Population_By_County.csv is in the working directory and run:
jupyter notebook ev_forecasting.ipynb
-
Run the Streamlit App
To launch the interactive forecaster:
streamlit run app.py
Tool | Purpose |
---|---|
ev_forecasting.ipynb |
Explore full data pipeline, modeling, and evaluation |
app.py |
Interactive forecasting tool for end-users |
- Integrate demographic data like population, income, or GDP by county.
- Try gradient boosting models like XGBoost or LightGBM.
- Explore deep learning with LSTM for sequential forecasting.
- Deploy via Docker or to Streamlit Cloud for public access.
This project is open-source and licensed under the MIT License.
Prepared for the AICTE Internship Cycle 2 by S4F