Skip to content

Edipool/recommendation_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Description

This project is an updated version of a recommender system built for a fictional social network. The recommender system has been refactored into logical modules, the connection to the remote database has been removed (all tables have been downloaded and are located in the dataset module), data logging has been added through DVC, a Dockerfile has been added to build an image and run a container, Docstrings have been added to all functions, and the approach to recommendation formation has been rethought. The previous version had unstable logic in recommendation formation, and text processing using TF-IDF has been added to improve hitrate@5 metric from 0.45 to 0.53. This recommendation system implements the "Content Approach". The data approach is more suitable than the "Collaborative Approach" because of the large amount of data and speed. The content approach is an approach that recommends to the user similar to those that he has already viewed. In this case, the recommendation system recommends to the user posts that he has not yet viewed, but which are similar to those that he has already viewed. The recommendation system is implemented in the form of a microservice that accepts requests and returns responses. The hitrate@5 metric is used as a recommendation quality metric. The hitrate@5 metric is the percentage of recommendations that the user has viewed.

Project Structure

The dataset module contains datasets, which are logged using DVC. The df_feed_data.csv file contains information on viewed posts. The df_text_plus_clustering.csv file contains information on post text, text clustering, and new features based on clustering. The df_post_text.csv file contains information on post text. The df_user_data.csv file contains information on users. The dists_df.csv file contains information on text clusters. The reason for creating a separate dataframe for clusters is that clustering takes time and resources. If you want to cluster the texts, you can do so by uncommenting cell 11 in the Experiments_with_the_recommendation system.app file. The .csv.dvc file extension is a DVC log and is not directly involved in the code's operation.

The models module contains models, which are logged using DVC. The catboost_model file is a CatBoost classifier model. The model_tree_classifier file is a Decision Tree Classifier model. The .dvc file extension is a DVC log and is not directly involved in the code's operation.

The src module contains the project's source code. The app.py file launches the microservice, processes specific requests, and generates responses. The get_model.py file loads the CatBoost classifier model. The load_table.py file loads the partially processed tables and the tables that need to be further processed for each specific request. The schema.py file validates input data. The app.py file contains the microservice's logic, based on message recommendations.

Running the code

The command to run docker: docker run --rm -p 8000:8000 recommendation_system The command in Postman: http://localhost:8000/post/recommendations?id=767&time=2022-12-12 21:10:16&limit=5 In the postman, you can change the user id, time from 2021-12-29 23:39:35 to 2021-10-01 06:06:44 and the limit

Main libraries used

Pyhon Fastapi Catboost Scikit_learn Pandas Numpy Pydantic Docker mlflow

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages