Bike sharing system is an innovative transportation strategy that provides individuals with bikes for their common use on a short-term basis for a price or for free. Over the last few decades, there has been a significant increase in the popularity of bike-sharing systems all over the world. This is because it is an environmentally sustainable, convenient and economical way of improving urban mobility. In addition to this, this system also helps to promote healthier habits among its users and reduce fuel consumption.
Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.
The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.
Date : year-month-day
Rented Bike count - Count of bikes rented at each hour
Hour - Hour of he day
Temperature-Temperature in Celsius
Humidity - %
Windspeed - m/s
Visibility - 10m
Dew point temperature - Celsius
Solar radiation - MJ/m2
Rainfall - mm
Snowfall - cm
Seasons - Winter, Spring, Summer, Autumn
Holiday - Holiday/No holiday
Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)
-
Initial preparations(Loading the dependencies and the data)
-
EDA
-
Clean-Up
Handling null values
Handling duplicate values
Removing Outliers
Feature engineering
- Feature encoding
Checking correlation for feature removal
Removing Multicollinearity
Obtaining correlation between dependent and independent variables
Pre processing of the data
- Target feature conditioning
Creating train and test dataset
Feature Scaling
- Model implementation
Linear Regression
Ridge Regression
Lasso Regression
Random forest regression
XGBoost
- Model explainability
EDA insights:
- Most number of bikes are rented in the Summer season and the lowest in the winter season.
- Over 96% of the bikes are rented on days that are considered as No Holiday.
- Most number of bikes are rented in the temperature range of 15 degrees to 30 degrees.
- Most number of bikes are rented when there is no snowfall or rainfall.
- Majority of the bikes are rented for a humidity percentage range of 30 to 70.
- The highest number of bike rentals have been done in the 18th hour, i.e 6pm, and lowest in the 4th hour, i.e 4am.
- Most of the bike rentals have been made when there is high visibility.
Challenges faced:
- Removing Outliers.
- Encoding the categorical columns.
- Removing Multicollinearity from the dataset.
- Choosing Model explainability technique.