method to link image (segments) to digital content #8

jhpoelen · 2023-04-10T20:51:01Z

⚠️ crazy idea ⚠️

In https://beehind.org , we have an illustration with image connected to text boxes. This illustration points to parts of an image and associates it with something else. In order to automatically generate this image, a method is needed to:

point to an area in an image
relate this pointer to some (other) digital content
record the provenance of this relation

At first glance, relating of an image area to some (textual) content is similar to, or perhaps a general case of, OCR - optical character recognition. OCR relates some image (area) to some character. OCR is used to extract text from images.

The Internet Archive folks, prolific users of OCR, created an article
https://archive.org/developers/ocr.html suggesting that three main ocr formats exist - hOCR (some html derivative) and two xml format (Alto and Page XML). hOCR and Alto are supported by tesseract, a commonly used OCR library.

Suggest to build a prototype that take a single part from the https://beehind.org illustration and encode it in hOCR, Alto or Page xml.

jhpoelen · 2023-04-10T21:18:45Z

Alternatively, see IIIF example https://courses.edx.org/courses/course-v1:HarvardX+MCB64.1x+2T2016/d16e07a5cec442eeb7cd9dfcb695dce0/ via https://iiif.io/demos/

jhpoelen · 2023-04-10T21:58:45Z

Note that the Internet Archive appears to have chosen for tesseract -> hOCR -based workflow.

The Internet Archive settled on using hOCR. At the time of writing, Tesseract does support outputting ALTO XML, but PAGE XML was not yet supported. hOCR was deemed sufficiently simple and flexible, with the added advantage that it is XHTML, which allows for viewing the documents in a browser. Various hOCR tools and libraries exist, as do hOCR viewers, such as hocrviewer-miradoc and hocrjs.

jhpoelen · 2023-04-13T19:55:21Z

@Daniel-Mietchen suggested to look into https://en.wikipedia.org/wiki/Hierarchical_Data_Format as well as layer/annotation features in map technologies (e.g., openstreet maps).

jhpoelen · 2023-04-21T21:57:17Z

ImageJ has a way to do measurements by drawing two lines (or some other shape). One two capture the scale bar shown in the picture. The other to capture the measurement taken. Manual work is needed to read the scale bar and translate the pixel distances to actual distances.

Same for Note for Nature (zooniverse)

Suggest to understand how ImageJ and Notes for Nature capture this information digitally and what file format is being used.

Daniel-Mietchen · 2023-04-21T23:12:20Z

Alternatively, see IIIF example https://courses.edx.org/courses/course-v1:HarvardX+MCB64.1x+2T2016/d16e07a5cec442eeb7cd9dfcb695dce0/ via https://iiif.io/demos/

I dug around a bit for IIIF-related documentation on Wikimedia projects, and what I found was a mostly stale collection of outdated pointers to dysfunct demos and bouts of enthusiasm modulated by lack of support, with https://commons.wikimedia.org/wiki/Commons:International_Image_Interoperability_Framework being the most useful resource.

One of the things it points to is https://github.com/IIIF/awesome-iiif, which has a section https://github.com/IIIF/awesome-iiif#image-servers with multiple IIIF server tools.

jhpoelen · 2023-04-26T14:31:51Z

@Daniel-Mietchen thanks for having a look at IIIF - it does appear that the framework is getting some traction in the natural history collections community. . . hmm, I wonder what is going on.

jhpoelen mentioned this issue Apr 13, 2023

add wikidata entries for the illustration related to CASTYPE1652 #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

method to link image (segments) to digital content #8

method to link image (segments) to digital content #8

jhpoelen commented Apr 10, 2023

jhpoelen commented Apr 10, 2023

jhpoelen commented Apr 10, 2023 •

edited

Loading

jhpoelen commented Apr 13, 2023

jhpoelen commented Apr 21, 2023 •

edited

Loading

Daniel-Mietchen commented Apr 21, 2023

jhpoelen commented Apr 26, 2023

method to link image (segments) to digital content #8

method to link image (segments) to digital content #8

Comments

jhpoelen commented Apr 10, 2023

jhpoelen commented Apr 10, 2023

jhpoelen commented Apr 10, 2023 • edited Loading

jhpoelen commented Apr 13, 2023

jhpoelen commented Apr 21, 2023 • edited Loading

Daniel-Mietchen commented Apr 21, 2023

jhpoelen commented Apr 26, 2023

jhpoelen commented Apr 10, 2023 •

edited

Loading

jhpoelen commented Apr 21, 2023 •

edited

Loading