Elasticsearch for Mia's collection data.
(Getting this all running requires that you have a local redis instance that's replicating our internal museum redis. You can create your own from our open data)
- Install
elasticsearch
:brew install homebrew/versions/elasticsearch17
- Enable
groovy
scripting foraggregations
- Start elasticsearch.
- Build the index:
make clean createIndex update
The search looks at the following "fields" for each artwork. Boost
determines how important that particular field is.
field |
boost |
description |
---|---|---|
artist.artist | 15 | the artist |
artist.folded | 15 | artist with special characters (é, ü, …) replaced with 'normal' 'english' letters |
title | 11 | the title of an artwork |
description | 3 | the "registrar" description of the artwork - how it was describes when accessioned |
text | 2 | "curatorial" text, the general label written about this work |
accession_number | object "accession number" | |
_all | all the fields in the record combined together, so nothing gets missed | |
artist.ngram | 2 | artist's name, ngrammed |
title.ngram | artwork title, ngrammed |
ngrams break search terms down into sub-word grams. So a
search for o'keefe
returns results for "Georgia O'Keffee" even when it's spelled differently.
Then there are "ranking functions" applied to the results. A few examples:
{filter: {term: {highlight: 'true'}}, weight: 3},
{filter: {term: {image: 'valid'}}, weight: 2},
{filter: {prefix: {room: 'g'}}, weight: 1.1},
…if it's a highlight, boost it by 3; if it has a valid image, 2; if it's currently on view, 1.1.
This all happens within a function score query.
Here are the main endpoints we use. Test them out at search.artsmia.org.
endpoint |
description | example |
---|---|---|
/:query |
searches for the given text, using ES query string syntax | horses from China |
/id/:id |
JSON for a single object by id | Olive Trees, Vincent Van Gogh |
/ids/:ids |
multiple objects by id | two personal favorites |
/random/art |
return one or more random artworks, matching an optional query | ten random artworks, currently displayed on the Museum's 3rd floor |
We index our objects regularly from our custom-built TMS API. See Makefile
for the confusing, shell-scripted details. It works by pulling the data from a local redis database that's synchronized with a system that watches for changes as they happen in TMS. We also index related content to our objects. A few other layers of data are added into elasticsearch to complement and improve the data from our API.