Skip to content

Dumbris/semantic-search-domain-adaptation

Repository files navigation

Intro

I want to research how BERT is useful in products search for e-commerce sites. Traditionally for product search uses term-matching algorithms builtin into ElasticSearch or Solr. But these systems may fail when queries and product descriptions use different terms to describe the same meaning. Semantic search based on BERT models could help in this case. The main question is how the domain adaptation of BERT model improves search relevancy.

Prelimitary results

My report for Huawei NLP course project.

Dataset

My initial plan is to use Home Depot dataset for finetuning/training and testing (https://www.kaggle.com/c/home-depot-product-search-relevance).

Runing expreriments

  • Install python package using pip, and run command with default config:

    eval_encoder
    
  • You can override any config option using command line, and use multirun feature to run many experiments:

    eval_encoder -m models.senttrans.loss="SoftmaxLoss,CosineSimilarityLoss" \
      models.senttrans.base_model="distilroberta-base-msmarco-v2,distilbert-base-nli-stsb-quora-ranking,sentence-transformers/LaBSE" \
      hydra.sweep.dir="eval_runs_results"
    
  • To collect single report from all runs. use command:

    ./collect_results.sh path/to/eval_runs_results
    

Local development

  • Run bash script:
    ./start_development.sh
    

It will create virtualenv for you, install all dependencies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published