Anime recommendation using Machine Learning by using Content Based Filtering technique
Due to GitHub maximum file size limit, so i don't include the dataset and binaries in the repository.
By default, this repository doesn't contains any dataset, you can download and generate it from these following tutorials:
- Create data directory in parent directory
|--- anime_recommendation
|--- data
|--- ...
- Create sub directory binary, dataset and raw
|--- anime_recommendation
|--- data
|--- binary
|--- dataset
|--- raw
|--- ...
- Open kaggle dataset: Here
- Download only
anime_with_synopsis.csv
andanime.csv
- Put downloaded dataset in raw directory
|--- anime_recommendation
|--- data
|--- binary
|--- dataset
|--- raw
|--- anime_with_synopsis.csv
|--- anime.csv
|--- ...
- Run this script:
python generate_data.py
By default, this app is using MySQL as databases to store the datasets, if you don't want to use MySQL you can edit some of codes to works with csv data.
But if you want to use MySQL you can run some codes:
python grant_sql.py
to create new user and grants the user privilegepython generate_sql.py
to convert CSV to SQL
This app generates binary data using NLP (Natural Language Processing) with TfidfVectorizer and LabelBinarizer technique, all of technique can be generated from python generate_data.py
after it, all of binaries appends to K Nearest Neighbors to created Machine Learning models.
By default, this app is using FastAPI to create an API. To run the API, run this file:
python main.py
then, type in the browser http://127.0.0.1/api
or http://127.0.0.1/docs
to see documentation.
- Upgrade to Elasticsearch to make search query faster (partialy done)
- Update anime database