Skip to content

Backport upstream #1290: Run ingest service python workers as abc, not root#37

Merged
new-usemame merged 3 commits into
mainfrom
merge/upstream-pr-1290
May 4, 2026
Merged

Backport upstream #1290: Run ingest service python workers as abc, not root#37
new-usemame merged 3 commits into
mainfrom
merge/upstream-pr-1290

Conversation

@new-usemame
Copy link
Copy Markdown
Owner

Cherry-pick of upstream crocodilestick/Calibre-Web-Automated#1290 by @haraldpdl.

Original title: Run ingest service python workers as abc, not root
Original PR: crocodilestick/Calibre-Web-Automated#1290

Verification

Walk through notes/merge/upstream-pr-1290-verify.md before merging.

Diff snapshot

See notes/cherry-pick-1290.diff (captured at prep time; rebase if the upstream PR has moved).


(Prepared by an automation pipeline. Do not merge until the verification checklist is signed off.)

With PUID/PGID set (the documented configuration), cps.py already
drops to abc via s6-setuidgid in svc-calibre-web-automated/run,
and the inotifywait call in cwa-ingest-service/run is also
prefixed with s6-setuidgid abc. The three other python invocations
in the same run script (watch_fallback.py on the polling path and
the two ingest_processor.py calls) run as root, so every
auto-imported book lands in /calibre-library as root:root. The web
UI, running as abc, then fails to delete these books with
Permission denied on metadata.opf.

Prefix each of those python3 invocations with s6-setuidgid abc so
the polling watcher and the ingest processor run with the same
uid and gid as the rest of the stack, producing abc-owned files
that the web UI can manage.
@new-usemame new-usemame added safe-tier-2 Auto-merge once CI green: <50 LOC isolated code change autopilot Opened by autopilot tick labels May 4, 2026
@new-usemame new-usemame merged commit f2616d6 into main May 4, 2026
7 of 8 checks passed
@new-usemame new-usemame deleted the merge/upstream-pr-1290 branch May 4, 2026 21:14
new-usemame added a commit that referenced this pull request May 4, 2026
…orough validation (#54)

* Revert "fix: use subquery loading for Books relationships to prevent DetachedInstanceError (#40)"

This reverts commit bc9386c.

* Revert "Run ingest service python workers as abc, not root (#37)"

This reverts commit f2616d6.

---------

Co-authored-by: new-usemame <248195428+new-usemame@users.noreply.github.com>
new-usemame added a commit that referenced this pull request May 4, 2026
* Run ingest service python workers as abc, not root (#37)

With PUID/PGID set (the documented configuration), cps.py already
drops to abc via s6-setuidgid in svc-calibre-web-automated/run,
and the inotifywait call in cwa-ingest-service/run is also
prefixed with s6-setuidgid abc. The three other python invocations
in the same run script (watch_fallback.py on the polling path and
the two ingest_processor.py calls) run as root, so every
auto-imported book lands in /calibre-library as root:root. The web
UI, running as abc, then fails to delete these books with
Permission denied on metadata.opf.

Prefix each of those python3 invocations with s6-setuidgid abc so
the polling watcher and the ingest processor run with the same
uid and gid as the rest of the stack, producing abc-owned files
that the web UI can manage.

Co-authored-by: new-usemame <248195428+new-usemame@users.noreply.github.com>

* feat(s6): one-time chown-to-abc migration alongside #37 re-land

#37 (held by #54 from v4.0.16) prefixes the ingest-service python workers
with s6-setuidgid abc. Architecturally correct — the web UI runs as abc
and was failing to delete books that the ingest worker had created as
root. But for users whose libraries already contain root:root-owned
author/series subdirectories from before the patch landed, post-#37
ingest of a new book targeting an existing root-owned author dir fails
with EACCES.

Empirically demonstrated by tests/integration/test_ingest_setuidgid_permission.py:

| Scenario                                                | Result        |
|---------------------------------------------------------|---------------|
| Pre-#37 root worker writes into root-owned subdir       | PASS          |
| Post-#37 abc worker writes into root-owned subdir       | EACCES        |
| Post-#37 abc worker after chown -R abc:abc /cal-lib     | PASS          |
| Post-#37 abc worker into clean abc-owned library        | PASS          |

Fix: a new oneshot s6 service cwa-chown-library-migration that runs once
per installation:

- Gated by /config/.cwa-chown-library-done sentinel (idempotent across
  reboots; second invocation no-ops)
- Skipped when NETWORK_SHARE_MODE is set — chown over NFS is slow and may
  be refused by the export's policy. Logs guidance for manual run; still
  writes the sentinel so the skip is recorded.
- Best-effort: chown failures are logged but never block boot.

cwa-ingest-service is wired to depend on the migration via
dependencies.d/cwa-chown-library-migration so the chown completes before
any ingest worker starts.

8 integration tests in test_ingest_setuidgid_permission.py exercise the
real artifacts inside the v4.0.15 production image (skipped when Docker
is unavailable):

- baseline pre-#37 behavior
- post-#37 regression visibility
- migration script chowns + writes sentinel
- migration script is idempotent
- migration script honors NETWORK_SHARE_MODE
- end-to-end: post-migration ingest into formerly root-owned dir works

Co-author: @haraldpdl (upstream PR #1290).

---------

Co-authored-by: new-usemame <248195428+new-usemame@users.noreply.github.com>
new-usemame added a commit that referenced this pull request May 18, 2026
…follow-up (#231)

Backport of upstream crocodilestick/Calibre-Web-Automated#1349 by @navels into our community-maintained CWA build. Original PR has been open since 2026-05-13 with no upstream review activity; users get the ingest-reliability improvements today rather than waiting for upstream pace.

The PR adds five orthogonal improvements: (1) stale ingest events for already-moved/deleted files fast-exit before the heavy startup path, eliminating the NFS/polling-fallback re-emit loop; (2) the s6 ingest-service script tracks per-file processing markers so the watcher can't reprocess the same path while it's in-flight; (3) module-import-time work in scripts/ingest_processor.py defers behind initialize_runtime() so the script can fast-exit without paying cps/CWA_DB/EPUBFixer/audiobook/requests import cost; (4) per-book refresh_cwa_session + invalidate_duplicate_cache + schedule_debounced_duplicate_scan calls consolidate into a single run_post_batch_follow_up() that fires once after the dirty-marker batch goes quiet; (5) bounded retries on the cwa-internal POST endpoints replace single-shot best-effort calls.

Reconciliation against fork-original work was non-trivial — full per-file rationale in the fixup commit body. The /cwa-internal/reconnect-db endpoint keeps fork PR #199's synchronous CalibreDB.refresh_for_new_data() (going back to TaskReconnectDatabase via WorkerThread.add would re-introduce the engine-disposal race that caused fork #192); ingest_processor.py keeps PR #199's _run_calibredb_add_with_retry + metadata_db_write_lock() and PR #208's deferred lock-acquire; the s6 service script preserves PR #37/#56's s6-setuidgid abc privilege drop in the new run_processor_with_timeout() helper; PR #210's debounce-default tests updated to no-op the per-script pins for ingest_processor (the value lives server-side now at /cwa-internal/queue-duplicate-scan).

Live-verified end-to-end on cwn-local: missing-target fast-exit doesn't initialize runtime or acquire lock; post-batch follow-up fires exactly once after the dirty-marker batch quiets and refreshes the duplicate cache cleanly (groups=2 max_book_id=12 logged); real ingest of alice-in-wonderland.epub triggered the s6 service Post-batch follow-up exactly once (not per book), no errors. 6 new unit tests + 35 smoke tests + 165 shell tests added by @navels all pass; 65 adjacent regression tests stay green.

Credit: @navels — both upstream commits authored by Lee Nave <navels@gmail.com> are preserved on the squash; the reconciliation commit landed under new-usemame identity. Dual-audience credit comment will be posted on CWA#1349 after the release tag publishes.
new-usemame added a commit that referenced this pull request May 18, 2026
)

Replaces the O(N) full-library duplicate scan with an O(1) maintained index.
Each book import / metadata edit / delete updates the index incrementally
instead of triggering a full re-scan; existing /duplicates page becomes
the user-facing entry point for the one-time post-upgrade baseline scan.

Backport of CWA upstream #1353 by @navels, reconciled with fork-divergent
work in scripts/ingest_processor.py and cps/editbooks.py: preserves PR #199
metadata.db flock + _run_calibredb_add_with_retry, PR #210 debounce default
30 + clamp floor 10, PR #212 SQLite UDF connect-event listener, PR #37/#56
s6-setuidgid abc privilege drop on the ingest worker, v4.0.75 fast-exit
pattern with _load_fork_cps_imports, and PR #100 batched cache-invalidation
semantics (mechanism migrated from per-batch invalidate_duplicate_cache()
to per-batch _queue_duplicate_scan_after_change()).

Verified on cwn-local: schema migration applied idempotently, 15-book
baseline scan populates cwa_duplicate_book_keys table with all rows, the
existing duplicate groups (Alice's Adventures in Wonderland ×5, The Republic
×2) render from the index, Resolution Preview Bootstrap modal replaces the
old alert() dialog, per-book incremental scan endpoint returns
{"success":true,"result_count":N,"message":"..."} in <50ms, container
restart clean with no traceback. 173 new+updated unit tests pass.

Inspired-by @navels in crocodilestick/Calibre-Web-Automated#1353.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autopilot Opened by autopilot tick safe-tier-2 Auto-merge once CI green: <50 LOC isolated code change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant