The Bike-Sharing-Count-Prediction project aims to predict the total number of bike rentals per hour based on various features such as weather conditions, time (hour, weekday, month), and user types (casual or registered users). The project applies both linear regression and polynomial regression techniques to build predictive models and utilizes regularization techniques such as Ridge, Lasso, and ElasticNet to improve model performance.
The dataset used for this project is the Bike Sharing Dataset, which contains hourly bike rental data for two years (2011 and 2012) in the Capital Bikeshare system, Washington D.C.
To run the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/your-username/Bike-Sharing-Count-Prediction.git
-
Navigate to the project directory:
cd Bike-Sharing-Count-Prediction
-
Install the required dependencies:
pip install -- upgrade -r requirements.txt
The dataset consists of the following features:
season
: Season (1: Winter, 2: Spring, 3: Summer, 4: Fall)yr
: Year (0: 2011, 1: 2012)mnth
: Month (1 to 12)hr
: Hour of the day (0 to 23)weekday
: Day of the week (0: Sunday to 6: Saturday)workingday
: Whether it is a working day or not (1: Yes, 0: No)holiday
: Whether it is a holiday or not (1: Yes, 0: No)weathersit
: Weather situation (1: Clear, 2: Mist, 3: Light snow/rain, 4: Heavy rain/snow)temp
: Normalized temperature in Celsius (temperature * 41)atemp
: Normalized feeling temperature in Celsius (feeling temperature * 50)hum
: Normalized humiditywindspeed
: Normalized wind speedcasual
: Count of casual users (non-registered users)registered
: Count of registered userscnt
: Total bike rental count (target variable)
The project begins by analyzing the distribution of bike-sharing data through various visualizations:
- Hourly Counts: The count of bike rentals by the hour.
- Distribution of Counts: The distribution of total, casual, and registered user counts.
- Weekday Counts: Counts grouped by the weekday, with holiday and working day indicators.
Visualizations were created using Seaborn and Matplotlib to understand the patterns in the data.
- One-Hot Encoding: Categorical features such as
season
,weekday
,workingday
, etc., were encoded usingOneHotEncoder
. - Scaling: Continuous features like
temp
,atemp
,humidity
, andwindspeed
were scaled usingMinMaxScaler
.
The final feature set used for modeling is a combination of scaled continuous variables and one-hot-encoded categorical variables.
A Linear Regression model was implemented using the scaled and encoded feature set. The steps included:
- Splitting the dataset into training and testing sets.
- Fitting the model on the training data.
- Predicting on the test set and evaluating using Mean Squared Error (MSE) and R-squared score.
A Polynomial Regression model was implemented to capture the non-linear relationship between the features and target. Polynomial features were generated using PolynomialFeatures
from sklearn
.
Regularization techniques were applied to prevent overfitting:
- Ridge Regression: Adds L2 penalty to the cost function.
- Lasso Regression: Adds L1 penalty to the cost function.
- ElasticNet: A combination of L1 and L2 penalties.
Each model was evaluated based on MSE and R-squared scores.
The project evaluated several models and techniques:
-
Linear Regression:
- R-squared Score:
0.68
- R-squared Score:
-
Polynomial Regression:
- R-squared Score:
0.92
- R-squared Score:
-
Ridge, Lasso, and ElasticNet Regularization:
- Improved generalization performance with reduced MSE.
Contributions are welcome! If you want to contribute to the project, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.