I want to research how BERT is useful in products search for e-commerce sites. Traditionally for product search uses term-matching algorithms builtin into ElasticSearch or Solr. But these systems may fail when queries and product descriptions use different terms to describe the same meaning. Semantic search based on BERT models could help in this case. The main question is how the domain adaptation of BERT model improves search relevancy.
My report for Huawei NLP course project.
My initial plan is to use Home Depot dataset for finetuning/training and testing (https://www.kaggle.com/c/home-depot-product-search-relevance).
-
Install python package using pip, and run command with default config:
eval_encoder
-
You can override any config option using command line, and use multirun feature to run many experiments:
eval_encoder -m models.senttrans.loss="SoftmaxLoss,CosineSimilarityLoss" \ models.senttrans.base_model="distilroberta-base-msmarco-v2,distilbert-base-nli-stsb-quora-ranking,sentence-transformers/LaBSE" \ hydra.sweep.dir="eval_runs_results"
-
To collect single report from all runs. use command:
./collect_results.sh path/to/eval_runs_results
- Run bash script:
./start_development.sh
It will create virtualenv for you, install all dependencies.