Elasticsearch for Mia's collection data.
ddev start
# populate local redis
docker run --rm -ti \
--network ddev-collection-elasticsearch_default \
-v ./tmp:/app -w /app \
riotx/riot file-import -h redis redis-riot-export.2025-07-09.json
# redis-cli example usage
ddev exec -s redis redis-cli info
# populate local OpenSearch
docker run --rm -ti \
--network ddev-collection-elasticsearch_default \
-v ./tmp:/app -w /app \
elasticdump/elasticsearch-dump \
--input=2025-07-09-object2.mapping.json \
--output='http://admin:97UxngYAArZ12jqt!jH20K@opensearch:9200/objects2' \
--type=mapping
docker run --rm -ti \
--network ddev-collection-elasticsearch_default \
-v ./tmp:/app -w /app \
elasticdump/elasticsearch-dump \
--input=2025-07-09-object2.data.json \
--output='http://admin:97UxngYAArZ12jqt!jH20K@opensearch:9200/objects2' \
--type=data
# verify index now exists
curl 'http://admin:97UxngYAArZ12jqt!jH20K@localhost:9200/objects2' | jq .
curl 'http://admin:97UxngYAArZ12jqt!jH20K@localhost:9200/objects2/_doc/3885 | jq .
# Start the server on http://localhost:3000/
ddev exec -s app node api/index.js
(Getting this all running requires that you have a local redis instance that's replicating our internal museum redis. You can create your own from our open data)
- Install
elasticsearch:brew install homebrew/versions/elasticsearch17 - Enable
groovyscripting foraggregations - Start elasticsearch.
- Build the index:
make clean createIndex update
The search looks at the following "fields" for each artwork. Boost
determines how important that particular field is.
field |
boost |
description |
|---|---|---|
| artist.artist | 15 | the artist |
| artist.folded | 15 | artist with special characters (é, ü, …) replaced with 'normal' 'english' letters |
| title | 11 | the title of an artwork |
| description | 3 | the "registrar" description of the artwork - how it was describes when accessioned |
| text | 2 | "curatorial" text, the general label written about this work |
| accession_number | object "accession number" | |
| _all | all the fields in the record combined together, so nothing gets missed | |
| artist.ngram | 2 | artist's name, ngrammed |
| title.ngram | artwork title, ngrammed |
ngrams break search terms down into sub-word grams. So a
search for o'keefe
returns results for "Georgia O'Keffee" even when it's spelled differently.
Then there are "ranking functions" applied to the results. A few examples:
{filter: {term: {highlight: 'true'}}, weight: 3},
{filter: {term: {image: 'valid'}}, weight: 2},
{filter: {prefix: {room: 'g'}}, weight: 1.1},…if it's a highlight, boost it by 3; if it has a valid image, 2; if it's currently on view, 1.1.
This all happens within a function score query.
Here are the main endpoints we use. Test them out at search.artsmia.org.
endpoint |
description | example |
|---|---|---|
/:query |
searches for the given text, using ES query string syntax | horses from China |
/id/:id |
JSON for a single object by id | Olive Trees, Vincent Van Gogh |
/ids/:ids |
multiple objects by id | two personal favorites |
/random/art |
return one or more random artworks, matching an optional query | ten random artworks, currently displayed on the Museum's 3rd floor |
We index our objects regularly from our custom-built TMS API. See Makefile for the confusing, shell-scripted details. It works by pulling the data from a local redis database that's synchronized with a system that watches for changes as they happen in TMS. We also index related content to our objects. A few other layers of data are added into elasticsearch to complement and improve the data from our API.