-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
method to link image (segments) to digital content #8
Comments
Alternatively, see IIIF example https://courses.edx.org/courses/course-v1:HarvardX+MCB64.1x+2T2016/d16e07a5cec442eeb7cd9dfcb695dce0/ via https://iiif.io/demos/ |
Note that the Internet Archive appears to have chosen for tesseract -> hOCR -based workflow.
|
@Daniel-Mietchen suggested to look into https://en.wikipedia.org/wiki/Hierarchical_Data_Format as well as layer/annotation features in map technologies (e.g., openstreet maps). |
ImageJ has a way to do measurements by drawing two lines (or some other shape). One two capture the scale bar shown in the picture. The other to capture the measurement taken. Manual work is needed to read the scale bar and translate the pixel distances to actual distances. Same for Note for Nature (zooniverse) Suggest to understand how ImageJ and Notes for Nature capture this information digitally and what file format is being used. |
I dug around a bit for IIIF-related documentation on Wikimedia projects, and what I found was a mostly stale collection of outdated pointers to dysfunct demos and bouts of enthusiasm modulated by lack of support, with https://commons.wikimedia.org/wiki/Commons:International_Image_Interoperability_Framework being the most useful resource. One of the things it points to is https://github.com/IIIF/awesome-iiif, which has a section https://github.com/IIIF/awesome-iiif#image-servers with multiple IIIF server tools. |
@Daniel-Mietchen thanks for having a look at IIIF - it does appear that the framework is getting some traction in the natural history collections community. . . hmm, I wonder what is going on. |
In https://beehind.org , we have an illustration with image connected to text boxes. This illustration points to parts of an image and associates it with something else. In order to automatically generate this image, a method is needed to:
At first glance, relating of an image area to some (textual) content is similar to, or perhaps a general case of, OCR - optical character recognition. OCR relates some image (area) to some character. OCR is used to extract text from images.
The Internet Archive folks, prolific users of OCR, created an article
https://archive.org/developers/ocr.html suggesting that three main ocr formats exist - hOCR (some html derivative) and two xml format (Alto and Page XML). hOCR and Alto are supported by tesseract, a commonly used OCR library.
Suggest to build a prototype that take a single part from the https://beehind.org illustration and encode it in hOCR, Alto or Page xml.
The text was updated successfully, but these errors were encountered: