βοΈ Open this notebook with jupyter online viewer jupyter nbviewer βοΈ
Language: python > 3.0
Data: The data of this project come from AirBnB.Specifically, there's a directory named data,which contains directories Aril,March and February.These directories have information from AirBnB platform such as id,zipcode,transit,Bedrooms,Beds,Review_scores_rating,Number_of_reviews,Neighbourhood,etc.
This project consists of 2 main queries.
Using simple and common functions such as groupby,sum,count,etc and plotting the results into multiple and various plots in order to make assuptions and get some results . Also,query 2 consist of 11 subqueries that are written as comments in the notebook.
Experimenting with vectorazation (TF-IDF,BoW) and cosine similarity Isolation of columns id,name,description from given airbnb files and text processing.More specifically Q.2 consists of:
Subqueries:
- TF-IDF,BoW Vectorazation : Vectorazation of the three isolated columns
- Cosine Similarity : Calculating the similarity between every element of tf-idf with all the others
- Function that finds and returns the most similar entries given an id
Β© Konstantinos Nikoletos & Myrto Iglezou