We conducted a usability evaluation for a semantic dataset search with 20 biodiversity scholars in June and July 2022 in Germany. The research aim addressed two objectives:
- we explored two query inputs (A/B testing) and
- we studied two different explanations strategies in the search summary to examine whether users are confused or attracted by presented semantic information such as URIs and ontologies.
We developed a semantic search over biological datasets with two user interfaces (UI) with different characteristics. The search expands query terms on semantically related terms and allows a search over hierarchy relations. UI 1 (Biodiv 1) provides a category, form-based search input with no information on utilized ontologies. UI 2 (Biodiv 2) offers a classical one input field and in the search summary, it provides links to matched URIs and ontologies.
Following the TREC guidelines (https://www-nlpir.nist.gov/projects/t9i/spec.html), we setup eight user tasks and surveys with questionnaires to guide users through the evaluation.
- the analysis folder contains a jupyter notebook to analyse a compiled csv
- analysis/results16 contains the results for 16 users
- analysis/results20 contains the results for 20 users
- scripts to generate the complied csv and further instructions are available under analysis/preprocessing
- data_corpus_preparation provides various small applications for the preparation and setup of the search index
Install Python (we developed and tested with version 3.9) and jupyter notebook (https://jupyter.org/). In a command line navigate to the root folder and run
jupyter notebook
The survey templates, questionnaires and the original survey results are available at Zenodo:
- What data are in the repository for Foraminifera (forams, single-cell organisms) in the benthic zone (water layer in the ocean floor)?
- How variable is the oxygen concentration of sea water of the global ocean?
- What data exist for Poales (invasive grasses), e.g., Poaceae (grass family)?
- How high are sulfate reduction rates at cold seeps (cold vents, areas in the ocean floor where hydrocarbon-rich fluids are leaking)?
- What data are in the repository on ocean acidification or coral bleaching?
- What data exist in the repository for bacteria in the groundwater?
- What data exist for Lepidoptera (butterflies, moths) on oaks (Quercus)?
- What data in the repository contain samples from surface water?
The code in this project is distributed under the terms of the GNU LGPL v3.0.
Further information on this study can be obtained from our publication: Löffler, F., Shafiei, F., Witte, R., König-Ries, B. & Klan, F., (2023). Semantic Search for Biological Datasets: A Usability Study on Modes of Querying and Explaining Search Results. In: König-Ries, B., Scherzinger, S., Lehner, W. & Vossen, G. (Hrsg.), BTW 2023. Gesellschaft für Informatik e.V.. DOI: 10.18420/BTW2023-56