Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16

catileptic · 2025-08-26T12:52:44Z

No description provided.

* Add openaleph-procrastinate. Bump versions to satisfy dependencies (poetry lock). * 🧑‍💻 Add pre-commit, use requirements.txt, upgrade to python3.13 * 🧑‍💻 Add dev requirements only for test build * 🔥 (github) Drop daily cache job * ✅ (tests/test_pdf) Fix whitespace errors from test results * 🔨 (make) Build before test * 👷 Inline base build * 🚧 Tweak builds and tags * 👷 (github) Skip intermediate arm46 build for tests * 👷 (github) Skip cache-from [tmp] * Revert "👷 (github) Skip cache-from [tmp]" This reverts commit 03f86fd. * 👷 (github/docker) Try this * 🚨 Apply black * 👷 (github/docker) Don't use registry cache * 🧪 (test_image) Skip gif test * 👷 (github/docker) maybe this * ✨ Boilerplate ingest task for procrastinate * 🔧 Use pydantic_settings * 📌 Use openaleph-procrastinate from git * ⚰️ Drop TranscriptionSupport * 🔥 Remove analysis part * 🔥 Remove servicelayer worker * ♻️ Refactor manager and supports to work with procrastinate * 🚧 Make procrastinate task to work with manager * ➖ languagecodes, pantomime -> rigour * 🔊 Tweak global logging * ♻️ Refactor cli with typer * 🧪 Make tests work with procrastinate refactor * 🩹 (ingestors/email) Use relative path * ✨ (support/timestamp) Fall back to dateparser for unknown formats * 🙈 Ignore more * 🔥 Remove unused lid model * 👷 (github) Tag base image properly * 📦 (docker) Use entrypoint and run procrastinate worker * 🧑‍💻 (contrib) Add non-docker debian install dependencies * 🧪 Add end-to-end testing setup * 🧪 (e2e) Working example * 🔧 (settings) Move deferring settings up to openaleph-procrastinate * 📌 requirements * Pin Tesserocr to 2.6.2 * Add ENV LD_PRELOAD for Apple Silicone as comment * Solve minor errors * Bump openaleph_procrastinate version * 🐛 (cli) Use defer settings correctly in debug mode * ⬆️ openaleph-procrastinate v0.0.7 * 👽️ Adapt explicit defers from openaleph-procrastinate v0.0.7 * 👷 Tweak compose settings * Bump openaleph-procrastinate version * Add namespace to entities. Remove app user * Add namespace info to test setup * Explicitly set the testing DB to sqlite * Pin procrastinate to 3.2.2 for tests * Add transcription procrastinate task * 📌 Pin procrastinate==3.2.2 for test docker build * 🚧 (docker) Cleanup duplicated RUN * 🚧 (cli) Adjust settings display * 🔧 (tests) Properly set FTM_STORE_URI * ⬆️ Dependencies * ✨ Documentation * 🔥 Drop google cloud vision support * Always index entities after ingesting * Replace get_dataset with get_fragments (ftmq.store) * ⬆️ ftm(q) 4.1.x, openaleph-procrastinate 0.0.13 * 🐛 (support/email) Catch empty name * 🎨 (support/transcription) Cleanup * 🔥 Drop unused settings * ⬆️ openaleph-procrastinate 0.0.14 * 🚧 (cli) Make foreign_id optional * ✅ Add e2e testing with minio * ⬆️ openaleph-procrastinate 0.0.16 * ⬆️ openaleph-procrastinate 0.0.16 * 💚 (github) Skip e2e * ⬆️ ftmq 4.1.1, openaleph-procrastinate 0.0.18 * 🚧 (tests/e2e) Adjustments * 🔖 Bump version: 3.24.0 → 5.0.0rc1 * 💚 (github) Enable e2e again * ⬆️ openaleph-procrastinate 0.0.20 * 🚧 (ingestors/image) Explicitly close PIL obj after processing * 🚧 (support/shell) Write to subprocess special DEVNULL * 🚧 (ingestors/access) Wrap subprocess call in context manager * 🚧 (ingestors/csv) Properly use context manager for file open * 🚧 (support/ocr) Clean up OCR engine after use * ⚗️ memray * 🔧 (settings) Properly configure servicelayer tags * 🚧 (tasks) Collect garbage, just in case * 📌 Pin olefile<0.47 as this leaks crazy memory * 📌 Fix RC version string * ⬆️ All the things * 🔖 Bump version: 5.0.0-rc1 → 5.0.0-rc2 * ⬆️ openaleph-procrastinate 0.0.25 and others * 🔖 Bump version: 5.0.0-rc2 → 5.0.0-rc3 * 🩹 (tasks) Pass through batch (formerly job_id) * 📌 Pin back tesserocr=2.6.2 * Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) (#16) * Compile tesserocr with c++ 14; use openaleph-servicelayer * Build tesserocr in Dockerfile.base; don't build Apple base docker image * Separate test docker image * Move tesserocr to ocr dependencies * Only generate main requirements from pre-commit hook * Move tesserocr to optional dependencies * Add build-test to Makefile test, before running tests * 🔖 Bump version: 5.0.0-rc3 → 5.0.0-rc4 * ⬆️ followthemoney 4.2.0 * ⬆️ ftmq 4.2.2 (psycopg3) * ⬆️ openaleph-procrastinate 0.0.29 * 🔧 Ensure psycopg3 for sl tags db * Temporarily disable daily ingest-file-base build * Update poetry.lock * 🔖 Bump version: 5.0.0-rc4 → 5.0.0-rc5 --------- Co-authored-by: Alex Ștefănescu <alex.stefanescu@pm.me> Co-authored-by: Alex Ștefănescu <catileptic@users.noreply.github.com>

catileptic added 8 commits August 26, 2025 12:38

Compile tesserocr with c++ 14; use openaleph-servicelayer

10b2c3c

Build tesserocr in Dockerfile.base; don't build Apple base docker image

99076f3

Separate test docker image

ec3b37a

Move tesserocr to ocr dependencies

19b6987

Only generate main requirements from pre-commit hook

9a2d3c7

Move tesserocr to optional dependencies

ee88641

Add build-test to Makefile test, before running tests

89de5fa

🔖 Bump version: 5.0.0-rc3 → 5.0.0-rc4

3a6f97a

catileptic merged commit b4fa0d5 into feat/procrastinate Aug 27, 2025
2 of 4 checks passed

catileptic deleted the chore/docker-image-update branch August 27, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16

Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16

Uh oh!

catileptic commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16

Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16

Uh oh!

Conversation

catileptic commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants