Skip to content

Towards an open source stack for e-commerce search

License

Notifications You must be signed in to change notification settings

maximilianwerk/osc-chorus

 
 

Repository files navigation

License Build Status

Chorus Logo

Chorus

Towards an open source tool stack for e-commerce search

Chorus makes deploying powerful ecommerce search easier by shifting the buy vs build decision in favour of build, so you can own your search! It deals with three issues:

  1. Starting from Scratch is Time Consuming Downloading a open source search engine isn't enough, it's like getting the parts of a Lego model, only without the directions ;-) . We need a better baseline to quickly get started.

  2. Integration of Tooling is Hard Search isn't just the index, it's also the analytics tooling, the relevance tooling, the operational monitoring that goes into it. Every team starts incurs the penalty of starting from scratch integrating the ecosystem of options.

  3. Sharing Knowledge is a Must! It isn't enough to just have conference talks, we need sample code and sample data in order to share knowledge about improving ecommerce search. Chorus is that public environment that you can use to share your next great idea!

Explore a demo, watch a video about the project or try it out by running the quickstart.sh script. Go deeper by watching the nine part series Meet Pete.

Want to stay up-to-date with the community? Visit https://querqy.org/ to learn more, and join the E-Commerce Search Slack group for tips, tricks and news on what's new in the Chorus ecosystem.

News

What Runs Where

We host a complete demonstration environment online for you to play with, see links below. Please note the Demo store isn't always available.

Relevant username and passwords are listed in TECHNICAL_DETAILS.md file.

Working with macOS? Pop open all the tuning related web pages with one terminal command:

open http://localhost:4000 http://localhost:8983 http://localhost:9000 http://localhost:3000 http://localhost:7979

5 Minutes to Run Chorus!

We are trying to strike a balance between making the setup process as easy and fool proof as possible with the need to not hide too much of the interactions between the projects that make up Chorus.

If you are impatient, we have a quick start script, ./quickstart.sh that sets you up, however I recommend you go through Kata 0: Setting up Chorus.

Structured Learning using Chorus

After that, you can learn about how to use all the tools in Chorus to improve search by following these Katas:

  1. First Kata: Lets Optimize a Query
  2. Second Kata: How to Sell Search Keywords and Terms
  3. Kata 003: Observability in Chorus
  4. Fourth Kata: How to Gather Human Judgements
  5. Fifth Kata: Some days you just KNOW what the right products are for a query!
  6. Sixth Kata: Adding Custom Rules to Querqy!
  7. Seventh Kata: Name you Algorithms using ParamSets
  8. Eighth Kata: Previewing Rules Before Users See Them
  9. Ninth Kata: Organize your queries using Tags
  10. Tenth Kata: Establishing a Baseline Metric for Relevancy
  11. Eleventh Kata: Establishing a Baseline Metric for Relevancy
  12. Twelfth Kata: Vector Search

There is also a video series that is very closely related called Meet Pete

Useful Commands for Chorus

To start your environment, but still run each command to setup the integrations manually, run:

docker-compose up --build -d

The quickstart command will launch a Solr cluster, load the configsets and product data for the ecommerce index, and launch the SMUI user interface:

./quickstart.sh

Interested in dense vectors? Make sure Docker has at least 10 GB of RAM and then run:

./quickstart.sh --with-vector-search

If you want to add in the offline lab environment based on Quepid, then tack on the --with-offline-lab parameter:

./quickstart.sh --with-offline-lab

To include the observability features, run:

./quickstart.sh --with-observability

To see what is happening in the Chorus stack you can tail the logs for all the components via:

docker-compose logs -tf

If you want to narrow down to just one component of the Chorus stack do:

docker-compose ps                       # list out the names of the components
docker-compose logs -tf solr1 solr2     # tail solr1 and solr2 only

To destroy your environment (including any volumes created like the mysql db), just run:

docker-compose down -v

or

./quickstart.sh --shutdown

If Docker is giving you a hard time then some options are:

docker system prune                     # removes orphaned images, networks, etc.
docker system prune -a --volumes        # removes all images, clears out your Docker diskspace if you full.

You may also have to increase the resources given to Docker, up to 4 GB RAM and 2 GB Swap space.

Chorus Data Details

The Chorus project includes some public datasets. These datasets let the community learn, experiment, and collaborate in a safe manner and are a key part of demonstrating how to build measurable and tunable ecommerce search with open source components.

The product data is gratefully sourced from Icecat and is licensed under their Open Content License.

The version of the Icecat product data that Chorus provides has the following changes:

  • Data converted to JSON format.
  • Products that don't have a 500x500 pixel image listed are removed.
  • Prices extracted for ~19,000 products from the https://www.upcitemdb.com/ service using EAN codes to match.

The ratings data (a.k.a explicit judgements) allows you to measure the impact of your changes to relevance. We are profoundly grateful to the team at Supahands for voluntarily generating multiple ratings for the set of 125 representative ecommerce queries and sharing that data with the Chorus community:

Learn more in Kata 006: How to Use Explicit Judgements about how you can work with Supahands to generate your human judgements.

About

Towards an open source stack for e-commerce search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 45.4%
  • XSLT 15.7%
  • Shell 14.5%
  • Python 10.6%
  • HTML 8.0%
  • JavaScript 2.7%
  • Other 3.1%