Instead of full vectors search (linear time) we are going to use k-NN which is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, if the features represent different physical units or come in vastly different scales then normalizing the training data can improve its accuracy dramatically
pip install -r requirements.txt
- Run Elastic OpenDistro cluster by doing
docker-compose -f opendistro-docker/docker-compose.yml up
- Create KNN index by using
scripts/create_index.sh
- Index Game Of Thrones data set by running
python index_data.py
- Do the KNN search for similarity in vectors by
python search_data.py
. Replace test message with anything you want