Skip to content

Release/3.20.1 #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 21, 2024
Merged

Release/3.20.1 #585

merged 21 commits into from
Feb 21, 2024

Conversation

stchris
Copy link
Contributor

@stchris stchris commented Feb 20, 2024

A regression has affected the ability to OCR certain image types.

tesserocr, which ingest-file uses for OCR, is displaying a surprising piece of behaviour which Aleph users have also noticed - they could no longer OCR JPEG images. This is due to the fact that the pre-compiled binaries aren't compiled with jpeg support anymore, nor support for a few other file formats.

This PR forces ingest-file to build tesserocr instead of using the binary wheel, and adds a JPEG test that can catch the regression.

This PR also introduces the ingestors clear-cache command, which takes a prefix and can delete all ingest cache entries.

@stchris stchris marked this pull request as ready for review February 21, 2024 11:50
@stchris stchris added this pull request to the merge queue Feb 21, 2024
Merged via the queue into main with commit 454eb72 Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants