This project aims to classify driving behavior into three categories: normal, aggressive, or slow. The classification is based on key metrics gathered from a vehicle’s On-Board Diagnostics (OBD) system, including:
• Acceleration
• Speed
• Instantaneous Engine Power (calculated based on fuel consumption)
• Engine RPM
The process begins with a sample of “normal” driving data, which serves as a baseline. Test driving data is then collected, and all key metrics are Z-score normalized. This standardization helps detect deviations from the normal driving pattern and classify each driving sample accordingly.
Once the driving behavior is classified, two machine learning models are trained to predict driving styles:
1. Logistic Regression
2. Neural Network
Both models are evaluated on the classified dataset. Their performances are visualized through confusion matrices, as well as other relevant plots, showing each model’s accuracy and capability in classifying driving styles. The results provide a clear comparison of how well each model predicts normal, aggressive, and slow driving behaviors.
One of the most valuable lessons I learned during this project is the importance of establishing a labeled dataset before training any machine learning models.
Since there wasn’t an existing dataset for driving behavior, I had to label the data myself. I did this by creating a script in python (main.py); however, this process of manual classification allowed me to have a deeper understanding of the driving patterns and behaviors that the models would later predict.
Without properly labeled data, even the most advanced machine learning algorithms would struggle to produce meaningful results. This is why a rather long portion of my time spent on this project was dedicated to developing a script that would accurately represent the data.
Another key takeaway was the significance of Z-score normalization, especially when dealing with data that comes in different units and magnitudes.
The metrics from the OBD system, such as speed, acceleration, and engine RPM, are measured on very different scales. Without normalization, these differences would cause models to overemphasize certain metrics while underweighting others, leading to biased classifications.
Z-score normalization helps by converting all features into the same scale, making the model more robust and capable of generalizing across different types of driving behavior.
This step was crucial in ensuring that all metrics were treated equally (before assigning weights) in the classification process.
Throughout the development of this project, I gained insight into the differences between Logistic Regression and Neural Networks for multi-class classification.
Logistic Regression is a (relatively) simple but effective model that works well for linearly separable data. In this program, it provided interpretable results and performed well in classifying driving behavior into one of the three available classes.
On the other hand, Neural Networks, while more complex, demonstrated greater flexibility and the ability to capture non-linear relationships between the metrics. The downside of Neural Networks is their need for more computational resources and tuning to avoid overfitting.
Logistic Regression was easier to implement and interpret but lacked the same predictive power in more complex scenarios. Both models have their strengths, and choosing one of them over the other depends on the complexity of the problem at hand and the amount of data available.





