docs(roadmap): soundness fixes + ICU locale fix for postgres collation#50
Merged
Merged
Conversation
The postgres service runs `postgres:18-alpine` (musl libc), which ships no glibc locales. The previous `--locale=en_US.UTF-8` therefore logged `WARNING: no usable system locales were found` at initdb and silently degraded the cluster's default collation to C/byte-order — so `ORDER BY LOWER(title)` (user pages, title indexes) sorted non-ASCII text by code point instead of linguistically. (CI didn't catch it: the CI postgres service doesn't pass this initdb arg.) Switch to the ICU locale provider. Verified on postgres:18-alpine: `ORDER BY` of A/a/B/b/Z yields `a,A,b,B,Z` (ICU en-US linguistic order, was byte-order `A,B,Z,a,b`); Cyrillic `ILIKE` still matches. initdb runs once, so existing data directories must be re-initialized to adopt the new collation; fresh deployments get it automatically.
Validated every concrete code claim in the roadmap against the current codebase and corrected the inaccurate ones: - F5: `'english'` regconfig is ~20 occurrences across postgres_search.py, api/routes.py (was omitted), and indexes.sql — not "6 locations". - F5: text truncation is code-point-safe (not byte-slicing); the CJK issue is a missing word-break, cosmetic not corruption. - F5: locale analysis assumed Debian/glibc; deploy is postgres:18-alpine (musl). Documented the byte-order collation degradation and its ICU fix (now applied in docker-compose.yml). Bumped stale postgres:16 -> 18. - README: schema_version is at v4 (migration 004), and there is no startup auto-migration (get_schema_version has no callers). - README: F2 Phase 1 corrected (the jinja_env.py "missing import" already exists; real task is importing cached filters into search_server.py). "four services" -> includes mcp-server; url_for_page() marked as future. - F1: `generate_page_seo_content()` is invented; point to the real html_seo.py helpers. - F2: refresh stale "confirmed via code analysis" line counts; 3 routes. - F4: CSS minifier IS default-on; one prefers-color-scheme block exists. - F6/F7: subreddit_metadata table + about template are proposed (not existing); subverse.type must not map to access-level subreddit_type; stream_rows() needs a COLUMN_MAPS entry (doesn't accept "any" table); fixed stale write_subreddit_pages_jinja2 line citation. - F10: ruff 0.15.0 -> 0.15.10; note pre-commit rev now drifts from the auto-merging uv-ecosystem ruff bumps. - F11: acceptance criterion no longer forces alpine (mcp_server is intentionally slim-bookworm); 3.14 is stable, framed as consistency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two coupled changes from auditing the roadmap (
/code-reviewofroadmap/) against the actual codebase:1.
docker-compose.yml— ICU locale provider (real bug fix)The postgres service runs
postgres:18-alpine(musl libc), which ships no glibc locales.--locale=en_US.UTF-8loggedWARNING: no usable system locales were foundand silently degraded the cluster's default collation to C/byte-order — soORDER BY LOWER(title)(user pages, title indexes) sorted non-ASCII text by code point, not linguistically. CI didn't catch it because the CI postgres service doesn't pass this initdb arg.Switched to the ICU locale provider. Verified on
postgres:18-alpine:ORDER BYof A/a/B/b/Z now yieldsa,A,b,B,Z(ICU linguistic order; was byte-orderA,B,Z,a,b), and CyrillicILIKEstill matches.2.
roadmap/— soundness fixesValidated every concrete code claim in the 13 roadmap docs (3 parallel audit agents + direct verification). Corrected the inaccurate ones. Highlights:
'english'regconfig: ~20 occurrences across 3 files (incl.api/routes.py), not "6 locations"schema_versionis v4; there is no startup auto-migration (get_schema_version()has no callers)jinja_env.py"missing import" already exists)generate_page_seo_content()is invented → realhtml_seo.pyhelpersprefers-color-schemeblock existssubverse.typemust not map to access-levelsubreddit_type;stream_rows()needs aCOLUMN_MAPSentrymcp_serveris intentionallyslim-bookworm); "3.14 not yet stable" correctedpostgres:16→18, ruff0.15.0→0.15.10, "four services"→+mcp-server, present/future tenseFull per-finding detail in the commit messages.
frontend-decisions.htmland spec 03 were audited and needed no changes.Verification
postgres:18-alpineValueErrorguard, etc.)