Skip to content

artsmia/collection-elasticsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch for Mia's collection data.

Setup (DDev)

ddev start

# populate local redis
docker run --rm -ti \
  --network ddev-collection-elasticsearch_default \
  -v ./tmp:/app  -w /app \
  riotx/riot file-import -h redis redis-riot-export.2025-07-09.json

# redis-cli example usage
ddev exec -s redis redis-cli info

# populate local OpenSearch
docker run --rm -ti \
  --network ddev-collection-elasticsearch_default \
  -v ./tmp:/app -w /app \
  elasticdump/elasticsearch-dump \
    --input=2025-07-09-object2.mapping.json \
    --output='http://admin:97UxngYAArZ12jqt!jH20K@opensearch:9200/objects2' \
    --type=mapping

docker run --rm -ti \
  --network ddev-collection-elasticsearch_default \
  -v ./tmp:/app -w /app \
  elasticdump/elasticsearch-dump \
    --input=2025-07-09-object2.data.json \
    --output='http://admin:97UxngYAArZ12jqt!jH20K@opensearch:9200/objects2' \
    --type=data

# verify index now exists
curl 'http://admin:97UxngYAArZ12jqt!jH20K@localhost:9200/objects2' | jq .

curl 'http://admin:97UxngYAArZ12jqt!jH20K@localhost:9200/objects2/_doc/3885 | jq .

# Start the server on http://localhost:3000/
ddev exec -s app node api/index.js

Setup (legacy)

(Getting this all running requires that you have a local redis instance that's replicating our internal museum redis. You can create your own from our open data)

  1. Install elasticsearch: brew install homebrew/versions/elasticsearch17
  2. Enable groovy scripting for aggregations
  3. Start elasticsearch.
  4. Build the index: make clean createIndex update

Search

The search looks at the following "fields" for each artwork. Boost determines how important that particular field is.

field boost description
artist.artist 15 the artist
artist.folded 15 artist with special characters (é, ü, …) replaced with 'normal' 'english' letters
title 11 the title of an artwork
description 3 the "registrar" description of the artwork - how it was describes when accessioned
text 2 "curatorial" text, the general label written about this work
accession_number object "accession number"
_all all the fields in the record combined together, so nothing gets missed
artist.ngram 2 artist's name, ngrammed
title.ngram artwork title, ngrammed

ngrams break search terms down into sub-word grams. So a search for o'keefe returns results for "Georgia O'Keffee" even when it's spelled differently.

Then there are "ranking functions" applied to the results. A few examples:

{filter: {term: {highlight: 'true'}}, weight: 3},
{filter: {term: {image: 'valid'}}, weight: 2},
{filter: {prefix: {room: 'g'}}, weight: 1.1},

…if it's a highlight, boost it by 3; if it has a valid image, 2; if it's currently on view, 1.1.

This all happens within a function score query.

API

Here are the main endpoints we use. Test them out at search.artsmia.org.

endpoint description example
/:query searches for the given text, using ES query string syntax horses from China
/id/:id JSON for a single object by id Olive Trees, Vincent Van Gogh
/ids/:ids multiple objects by id two personal favorites
/random/art return one or more random artworks, matching an optional query ten random artworks, currently displayed on the Museum's 3rd floor

Indexing

We index our objects regularly from our custom-built TMS API. See Makefile for the confusing, shell-scripted details. It works by pulling the data from a local redis database that's synchronized with a system that watches for changes as they happen in TMS. We also index related content to our objects. A few other layers of data are added into elasticsearch to complement and improve the data from our API.

About

search infrastructure for Minneapolis Institute of Art collection

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •