Skip to content

docs(roadmap): soundness fixes + ICU locale fix for postgres collation#50

Merged
19-84 merged 2 commits into
mainfrom
chore/roadmap-soundness-and-compose-locale
Jun 9, 2026
Merged

docs(roadmap): soundness fixes + ICU locale fix for postgres collation#50
19-84 merged 2 commits into
mainfrom
chore/roadmap-soundness-and-compose-locale

Conversation

@19-84

@19-84 19-84 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

What

Two coupled changes from auditing the roadmap (/code-review of roadmap/) against the actual codebase:

1. docker-compose.yml — ICU locale provider (real bug fix)

The postgres service runs postgres:18-alpine (musl libc), which ships no glibc locales. --locale=en_US.UTF-8 logged WARNING: no usable system locales were found and silently degraded the cluster's default collation to C/byte-order — so ORDER BY LOWER(title) (user pages, title indexes) sorted non-ASCII text by code point, not linguistically. CI didn't catch it because the CI postgres service doesn't pass this initdb arg.

Switched to the ICU locale provider. Verified on postgres:18-alpine: ORDER BY of A/a/B/b/Z now yields a,A,b,B,Z (ICU linguistic order; was byte-order A,B,Z,a,b), and Cyrillic ILIKE still matches.

⚠️ initdb runs once — existing data directories keep their current (byte-order) collation until re-initialized. Fresh deployments get ICU automatically.

2. roadmap/ — soundness fixes

Validated every concrete code claim in the 13 roadmap docs (3 parallel audit agents + direct verification). Corrected the inaccurate ones. Highlights:

Severity Fix
High F5 'english' regconfig: ~20 occurrences across 3 files (incl. api/routes.py), not "6 locations"
High README: schema_version is v4; there is no startup auto-migration (get_schema_version() has no callers)
High README/F2 Phase 1 contradiction fixed (the jinja_env.py "missing import" already exists)
High F5 locale analysis assumed Debian/glibc — documented the Alpine/musl byte-order degradation and its ICU fix
Med F1 generate_page_seo_content() is invented → real html_seo.py helpers
Med F4: CSS minifier is default-on; a prefers-color-scheme block exists
Med F7: subverse.type must not map to access-level subreddit_type; stream_rows() needs a COLUMN_MAPS entry
Med F11 acceptance criterion no longer forces alpine (mcp_server is intentionally slim-bookworm); "3.14 not yet stable" corrected
Low Stale line counts, postgres:1618, ruff 0.15.00.15.10, "four services"→+mcp-server, present/future tense

Full per-finding detail in the commit messages. frontend-decisions.html and spec 03 were audited and needed no changes.

Verification

  • ICU collation + Cyrillic case-fold tested directly on postgres:18-alpine
  • Every corrected code claim re-verified against the source (function line numbers, table versions, regconfig counts, ValueError guard, etc.)
  • Docs-only except the one-line compose initdb change

19-84 added 2 commits June 9, 2026 01:41
The postgres service runs `postgres:18-alpine` (musl libc), which ships
no glibc locales. The previous `--locale=en_US.UTF-8` therefore logged
`WARNING: no usable system locales were found` at initdb and silently
degraded the cluster's default collation to C/byte-order — so
`ORDER BY LOWER(title)` (user pages, title indexes) sorted non-ASCII text
by code point instead of linguistically. (CI didn't catch it: the CI
postgres service doesn't pass this initdb arg.)

Switch to the ICU locale provider. Verified on postgres:18-alpine:
`ORDER BY` of A/a/B/b/Z yields `a,A,b,B,Z` (ICU en-US linguistic order,
was byte-order `A,B,Z,a,b`); Cyrillic `ILIKE` still matches.

initdb runs once, so existing data directories must be re-initialized to
adopt the new collation; fresh deployments get it automatically.
Validated every concrete code claim in the roadmap against the current
codebase and corrected the inaccurate ones:

- F5: `'english'` regconfig is ~20 occurrences across postgres_search.py,
  api/routes.py (was omitted), and indexes.sql — not "6 locations".
- F5: text truncation is code-point-safe (not byte-slicing); the CJK
  issue is a missing word-break, cosmetic not corruption.
- F5: locale analysis assumed Debian/glibc; deploy is postgres:18-alpine
  (musl). Documented the byte-order collation degradation and its ICU fix
  (now applied in docker-compose.yml). Bumped stale postgres:16 -> 18.
- README: schema_version is at v4 (migration 004), and there is no
  startup auto-migration (get_schema_version has no callers).
- README: F2 Phase 1 corrected (the jinja_env.py "missing import" already
  exists; real task is importing cached filters into search_server.py).
  "four services" -> includes mcp-server; url_for_page() marked as future.
- F1: `generate_page_seo_content()` is invented; point to the real
  html_seo.py helpers.
- F2: refresh stale "confirmed via code analysis" line counts; 3 routes.
- F4: CSS minifier IS default-on; one prefers-color-scheme block exists.
- F6/F7: subreddit_metadata table + about template are proposed (not
  existing); subverse.type must not map to access-level subreddit_type;
  stream_rows() needs a COLUMN_MAPS entry (doesn't accept "any" table);
  fixed stale write_subreddit_pages_jinja2 line citation.
- F10: ruff 0.15.0 -> 0.15.10; note pre-commit rev now drifts from the
  auto-merging uv-ecosystem ruff bumps.
- F11: acceptance criterion no longer forces alpine (mcp_server is
  intentionally slim-bookworm); 3.14 is stable, framed as consistency.
@19-84 19-84 merged commit e249482 into main Jun 9, 2026
12 checks passed
@19-84 19-84 deleted the chore/roadmap-soundness-and-compose-locale branch June 9, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant