In this project, I followed the CRISP-DM process to analyze the Seattle Airbnb homes data and answer the following three questions:
- What are the busiest times of the year to visit Seattle? By how much do prices spike?
- Which neighborhoods are more crowded and expensive?
- Which features help in predicting price?
There are three Jupyter notebook files:
- Q1- The busiest times of the year and the highest prices.ipynb
- Q2- The most crowded and expensive neighborhoods.ipynb
- Q3- The most important features in predicting price.ipynb
These files include the answers of the above three questions, outlined data science process steps, which are gather, assess, clean, analyze, model, and visualize.
The Seattle Airbnb Open Data file that was downloaded from here
The main findings are:
- December, March, and October are the busiest months of the year, while January, February and July are the least.
- prices increase in summer season and decrease in winter season.
- Magnolia, Queen Anne, Downtown, and Cascade are more expensive.
- Delridge, Northgate, Rainier Valley, and Lake City are cheaper
- Capitol Hill, Downtown, and Central Area are more crowded.
- Interbay, Seward Park, Magnolia, and Lake City are less crowded.
- The number of bedrooms and bathrooms, the weekly cost of the listing, and the cost of cleaning service are the most important factor in determine listings price.
Non-technical details with deep analysis of the project can be found on the article: Are accommodation expensive in Seattle?
- Machine Learning Libraries: Pandas, NumPy, Scikit-learn
- Python 2D plotting library: Matplotlib
The below sites, were very useful for completing the projects,
- Geeks for geeks: https://www.geeksforgeeks.org
- Scikit-learn: https://scikit-learn.org