forked from alephdata/ingest-file
-
Couldn't load subscription status.
- Fork 1
Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
catileptic
added a commit
that referenced
this pull request
Aug 31, 2025
* Add openaleph-procrastinate. Bump versions to satisfy dependencies (poetry lock). * 🧑💻 Add pre-commit, use requirements.txt, upgrade to python3.13 * 🧑💻 Add dev requirements only for test build * 🔥 (github) Drop daily cache job * ✅ (tests/test_pdf) Fix whitespace errors from test results * 🔨 (make) Build before test * 👷 Inline base build * 🚧 Tweak builds and tags * 👷 (github) Skip intermediate arm46 build for tests * 👷 (github) Skip cache-from [tmp] * Revert "👷 (github) Skip cache-from [tmp]" This reverts commit 03f86fd. * 👷 (github/docker) Try this * 🚨 Apply black * 👷 (github/docker) Don't use registry cache * 🧪 (test_image) Skip gif test * 👷 (github/docker) maybe this * ✨ Boilerplate ingest task for procrastinate * 🔧 Use pydantic_settings * 📌 Use openaleph-procrastinate from git * ⚰️ Drop TranscriptionSupport * 🔥 Remove analysis part * 🔥 Remove servicelayer worker * ♻️ Refactor manager and supports to work with procrastinate * 🚧 Make procrastinate task to work with manager * ➖ languagecodes, pantomime -> rigour * 🔊 Tweak global logging * ♻️ Refactor cli with typer * 🧪 Make tests work with procrastinate refactor * 🩹 (ingestors/email) Use relative path * ✨ (support/timestamp) Fall back to dateparser for unknown formats * 🙈 Ignore more * 🔥 Remove unused lid model * 👷 (github) Tag base image properly * 📦 (docker) Use entrypoint and run procrastinate worker * 🧑💻 (contrib) Add non-docker debian install dependencies * 🧪 Add end-to-end testing setup * 🧪 (e2e) Working example * 🔧 (settings) Move deferring settings up to openaleph-procrastinate * 📌 requirements * Pin Tesserocr to 2.6.2 * Add ENV LD_PRELOAD for Apple Silicone as comment * Solve minor errors * Bump openaleph_procrastinate version * 🐛 (cli) Use defer settings correctly in debug mode * ⬆️ openaleph-procrastinate v0.0.7 * 👽️ Adapt explicit defers from openaleph-procrastinate v0.0.7 * 👷 Tweak compose settings * Bump openaleph-procrastinate version * Add namespace to entities. Remove app user * Add namespace info to test setup * Explicitly set the testing DB to sqlite * Pin procrastinate to 3.2.2 for tests * Add transcription procrastinate task * 📌 Pin procrastinate==3.2.2 for test docker build * 🚧 (docker) Cleanup duplicated RUN * 🚧 (cli) Adjust settings display * 🔧 (tests) Properly set FTM_STORE_URI * ⬆️ Dependencies * ✨ Documentation * 🔥 Drop google cloud vision support * Always index entities after ingesting * Replace get_dataset with get_fragments (ftmq.store) * ⬆️ ftm(q) 4.1.x, openaleph-procrastinate 0.0.13 * 🐛 (support/email) Catch empty name * 🎨 (support/transcription) Cleanup * 🔥 Drop unused settings * ⬆️ openaleph-procrastinate 0.0.14 * 🚧 (cli) Make foreign_id optional * ✅ Add e2e testing with minio * ⬆️ openaleph-procrastinate 0.0.16 * ⬆️ openaleph-procrastinate 0.0.16 * 💚 (github) Skip e2e * ⬆️ ftmq 4.1.1, openaleph-procrastinate 0.0.18 * 🚧 (tests/e2e) Adjustments * 🔖 Bump version: 3.24.0 → 5.0.0rc1 * 💚 (github) Enable e2e again * ⬆️ openaleph-procrastinate 0.0.20 * 🚧 (ingestors/image) Explicitly close PIL obj after processing * 🚧 (support/shell) Write to subprocess special DEVNULL * 🚧 (ingestors/access) Wrap subprocess call in context manager * 🚧 (ingestors/csv) Properly use context manager for file open * 🚧 (support/ocr) Clean up OCR engine after use * ⚗️ memray * 🔧 (settings) Properly configure servicelayer tags * 🚧 (tasks) Collect garbage, just in case * 📌 Pin olefile<0.47 as this leaks crazy memory * 📌 Fix RC version string * ⬆️ All the things * 🔖 Bump version: 5.0.0-rc1 → 5.0.0-rc2 * ⬆️ openaleph-procrastinate 0.0.25 and others * 🔖 Bump version: 5.0.0-rc2 → 5.0.0-rc3 * 🩹 (tasks) Pass through batch (formerly job_id) * 📌 Pin back tesserocr=2.6.2 * Dockerfile refactorings (tesserocr 2.6.2, openaleph-servicelayer etc.) (#16) * Compile tesserocr with c++ 14; use openaleph-servicelayer * Build tesserocr in Dockerfile.base; don't build Apple base docker image * Separate test docker image * Move tesserocr to ocr dependencies * Only generate main requirements from pre-commit hook * Move tesserocr to optional dependencies * Add build-test to Makefile test, before running tests * 🔖 Bump version: 5.0.0-rc3 → 5.0.0-rc4 * ⬆️ followthemoney 4.2.0 * ⬆️ ftmq 4.2.2 (psycopg3) * ⬆️ openaleph-procrastinate 0.0.29 * 🔧 Ensure psycopg3 for sl tags db * Temporarily disable daily ingest-file-base build * Update poetry.lock * 🔖 Bump version: 5.0.0-rc4 → 5.0.0-rc5 --------- Co-authored-by: Alex Ștefănescu <alex.stefanescu@pm.me> Co-authored-by: Alex Ștefănescu <catileptic@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.