Towards an open source tool stack for e-commerce search
Chorus makes deploying powerful ecommerce search easier by shifting the buy vs build decision in favour of build, so you can own your search! It deals with three issues:
-
Starting from Scratch is Time Consuming Downloading a open source search engine isn't enough, it's like getting the parts of a Lego model, only without the directions ;-) . We need a better baseline to quickly get started.
-
Integration of Tooling is Hard Search isn't just the index, it's also the analytics tooling, the relevance tooling, the operational monitoring that goes into it. Every team starts incurs the penalty of starting from scratch integrating the ecosystem of options.
-
Sharing Knowledge is a Must! It isn't enough to just have conference talks, we need sample code and sample data in order to share knowledge about improving ecommerce search. Chorus is that public environment that you can use to share your next great idea!
Explore a demo, watch a video about the project or try it out by running the quickstart.sh script. Go deeper by watching the nine part series Meet Pete.
Want to stay up-to-date with the community? Visit https://querqy.org/ to learn more, and join the E-Commerce Search Slack group for tips, tricks and news on what's new in the Chorus ecosystem.
- 29 March 2023: Building Vector Search in Chorus: A Technical Deep Dive
- 23 Feb 2023: Haystack On Tour, Kraków Feb 2023 - Atita Arora: Vectorize your E-commerce Search (video)
- 23 March 2022: Chorus, now also for Elasticsearch!
- 17th June 2021: Encores? - Going beyond matching and ranking of search results - Chorus is used at BerlinBuzzwords.
- 15th November 2020: Chorus Workshop Series Announced - Learn from the creators of the components of Chorus via six workshops.
- 17th October 2020: Chorus featured at ApacheCon @Home (video) - René and Eric give a talk at ApacheCon on Chorus.
- 10th June 2020: Chorus Announced at BerlinBuzzwords - First release of Chorus shared with the world at a workshop.
- April 2020: Paul Maria Bartusch, René Kriegler, Johannes Peter & Eric Pugh brainstorm challenges with search teams adopting technologies like Querqy and come up with the Chorus idea.
We host a complete demonstration environment online for you to play with, see links below. Please note the Demo store isn't always available.
- "Chorus Electronics" store runs at http://localhost:4000 | http://chorus.dev.o19s.com:4000
- Solr runs at http://localhost:8983 | http://chorus.dev.o19s.com:8983
- SMUI runs at http://localhost:9000 | http://chorus.dev.o19s.com:9000
- Quepid runs at http://localhost:3000 | http://chorus.dev.o19s.com:3000
- Keycloak runs at http://keycloak:9080 | http://chorus.dev.o19s.com:9080
- Prometheus runs at http://localhost:9090 | http://chorus.dev.o19s.com:9090
- Grafana runs at http://localhost:9091 | http://chorus.dev.o19s.com:9091
- Jaeger runs at http://localhost:16686 | http://chorus.dev.o19s.com:16686
- Embeddings runs at http://localhost:8000/docs | http://chorus.dev.o19s.com:8000/docs
Relevant username and passwords are listed in TECHNICAL_DETAILS.md file.
Working with macOS? Pop open all the tuning related web pages with one terminal command:
open http://localhost:4000 http://localhost:8983 http://localhost:9000 http://localhost:3000
We are trying to strike a balance between making the setup process as easy and fool proof as possible with the need to not hide too much of the interactions between the projects that make up Chorus.
If you are impatient, we have a quick start script, ./quickstart.sh
that sets you up, however I recommend you go through Kata 0: Setting up Chorus.
After that, you can learn about how to use all the tools in Chorus to improve search by following these Katas:
- First Kata: Lets Optimize a Query
- Second Kata: How to Sell Search Keywords and Terms
- Kata 003: Observability in Chorus
- Fourth Kata: How to Gather Human Judgements
- Fifth Kata: Some days you just KNOW what the right products are for a query!
- Sixth Kata: Adding Custom Rules to Querqy!
- Seventh Kata: Name you Algorithms using ParamSets
- Eighth Kata: Previewing Rules Before Users See Them
- Ninth Kata: Organize your queries using Tags
- Tenth Kata: Establishing a Baseline Metric for Relevancy
- Eleventh Kata: Establishing a Baseline Metric for Relevancy
- Twelfth Kata: Vector Search
There is also a video series that is very closely related called Meet Pete
To start your environment, but still run each command to setup the integrations manually, run:
docker compose up --build -d
The quickstart command will launch a Solr cluster, load the configsets and product data for the ecommerce index, and launch the SMUI user interface:
./quickstart.sh
Interested in dense vectors? Make sure Docker has at least 10 GB of RAM and then run:
./quickstart.sh --with-vector-search
If you want to add in the offline lab environment based on Quepid, then tack on the --with-offline-lab
parameter:
./quickstart.sh --with-offline-lab
To include the observability features, run:
./quickstart.sh --with-observability
To see what is happening in the Chorus stack you can tail the logs for all the components via:
docker compose logs -tf
If you want to narrow down to just one component of the Chorus stack do:
docker compose ps # list out the names of the components
docker compose logs -tf solr1 solr2 # tail solr1 and solr2 only
To destroy your environment (including any volumes created like the mysql db), just run:
docker compose down -v
or
./quickstart.sh --shutdown
If Docker is giving you a hard time then some options are:
docker system prune # removes orphaned images, networks, etc.
docker system prune -a --volumes # removes all images, clears out your Docker diskspace if you full.
You may also have to increase the resources given to Docker, up to 4 GB RAM and 2 GB Swap space.
The Chorus project includes some public datasets. These datasets let the community learn, experiment, and collaborate in a safe manner and are a key part of demonstrating how to build measurable and tunable ecommerce search with open source components.
The product data is gratefully sourced from Icecat and is licensed under their Open Content License.
The version of the Icecat product data that Chorus provides has the following changes:
- Data converted to JSON format.
- Products that don't have a 500x500 pixel image listed are removed.
- Prices extracted for ~19,000 products from the https://www.upcitemdb.com/ service using EAN codes to match.
The ratings data (a.k.a explicit judgements) allows you to measure the impact of your changes to relevance. We are profoundly grateful to the team at Supahands for voluntarily generating multiple ratings for the set of 125 representative ecommerce queries and sharing that data with the Chorus community:
- Broad_Query_Set_rated.csv - Query/Doc/Rating set suitable for measuring experiments.
- Broad_Query_Set_individual_ratings.csv - Raw ratings generated by the three individual SupaAgents per query.
Learn more in Kata 006: How to Use Explicit Judgements about how you can work with Supahands to generate your human judgements.