Skip to content

chenliseu/Predict-Bluebike-Demand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Predict-Bluebike-Demand

Blue bike is a public bike sharing system in city of Boston. As of now, the system had deployed more than 400 stations with a fleet of over 4,000 bikes in Boston Metropolitan area. Customer can choose to rent a bike at one of 400 dock stations, and pay via apps from smart phone, as a member for unlimited monthly access, or as casual customer. In this project, I am particularly interested in predicting blue bike hourly rental using blue bike public available System data. More specially, I will use blue bike trip history data starting from Jan 2019 to Mar 2022. Dataset has been transformed to hourly share demand based on trip starts time.

To make more informative prediction, Hourly blue bike demand data is consolidated with hourly weather observation from Jan 2019 to Mar 2022, provided by national center for environment information, using NOAA climate data record for Boston station WBAN:14739 (Logan airport). The weather information will merge with blue bike demand data by hour of a particular day of the year. Additional information including holidays and weekdays have also been added to the dataset, extracting from federal holiday calendar.

I am interested in understanding how people’s riding behaviors have been changed since pandemic time. I will explore questions such as how bike rental volume has being shifted from 2019 to 2022? has people’s rental behavior been changed due to Covid -19 quarantine and work from home policy? has location of the most demanding stations been moved though out these three years?

The final hourly blue bike hourly dataset has 28046 records and 14 columns. The goal is to predict blue bike hourly demand using various regression method. Dataset was split randomly as time independent model with 60/40 train/test ratio. 6 algorithms will be applied on dataset: linear regression, Lasso Regression, support vector regression, random forest, XGBoost tree and neural net. The model performance is measured using Root Mean Squared Error (RMSE), R Squared Score (R^2) and lift chart. During the process, we will investigate how model’s performance has been changed as we progress to more complex algorithms. We will end up the project with conclusion and future improvement works.

Releases

No releases published

Packages

No packages published

Languages