Fix case-insensitive language code matching#287
Conversation
Fixes issue where documents with language codes in different cases (e.g., lang: pt-br) were not processed when config had different casing (e.g., languages: ['pt-BR']), causing: - Documents silently skipped during coordination - Files generated in _site/pt-BR/ but URLs using /pt-br/ - 404 errors and broken hreflang links Changes: - Add normalization infrastructure to Site class - Normalize language codes early in coordinate_documents - Use case-insensitive matching for all language comparisons - Preserve original config case for file paths and URLs - Support case-insensitive data directory lookups All language code comparisons now use lowercase for matching while preserving the original case from config for display and file paths. Tests: - Add 6 new unit tests for normalization helpers - Add integration test validating case mismatch scenario - Update existing test to reflect case-insensitive behavior - All 44 tests pass with no regressions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Merges PR untra#287 fix into combined-features branch. This adds case-insensitive language code matching, allowing users to use any case variant (pt-br, pt-BR, PT-BR) and have it work correctly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> # Conflicts: # lib/jekyll/polyglot/liquid/tags/i18n_headers.rb # lib/jekyll/polyglot/patches/jekyll/site.rb # spec/jekyll/polyglot/patches/jekyll/site_spec.rb
| # Create normalized lookup hash: lowercase -> original case | ||
| @lang_norm_map = {} | ||
| @languages.each { |lang| @lang_norm_map[lang.downcase] = lang } | ||
|
|
||
| # Store normalized versions for fast lookup | ||
| @languages_normalized = @languages.map(&:downcase) |
There was a problem hiding this comment.
this is overkill ; theres being forgiving of miscased letters, and then theres making internal maps of different casings of the same languages. I don't like this approach with :lang_norm_map, :languages_normalized
There was a problem hiding this comment.
you can see polyglot works by matching on language codes. there's being forgiving of case insensitivity, but when a system matches on specific (small) strings. there's such a thing as Too Much Magic ; the expectation of users is that they can match the required language codes as they defined them. Normalizing combinations of small strings into their cased shapes is wasteful and adds to the complexity of polyglot's pattern finding. Matching on every combination of casing for languages codes isn't implicitly more helpful, but a source of non-explicit software bugs.
For sure there is more room for making mistakes with the hyphenated language codes; but pt-BR or pt-br can be specified in the languages configuration in the yaml, and then that is what gets matched on. Polyglot uses regex heavily, and case sensitivity. Programmers need to be consistent with these keys.
Summary
Fixes case-sensitivity bug where documents with language codes in different cases (e.g.,
lang: pt-br) were not processed when config had different casing (e.g.,languages: ['pt-BR']).Problem
Users experienced:
_site/pt-BR/but canonical URLs using/pt-br/Solution
Implements a hybrid normalization approach:
coordinate_documentsChanges
Core Implementation
lib/jekyll/polyglot/patches/jekyll/site.rb
@lang_norm_mapand@languages_normalizedfor normalizationnormalize_lang()andlang_exists?()helper methodsfetch_languages()to create normalization mappingsderive_lang_from_path()for case-insensitive matchingcoordinate_documents()to normalize earlyassignPageLanguagePermalinks()to use normalized keyslib/jekyll/polyglot/hooks/coordinate.rb
lib/jekyll/polyglot/liquid/tags/i18n_headers.rb
Tests
Testing
Examples
Now works with any case variant:
Files with language in path also work:
pt-br/page.md→ Matchespt-BRin config ✅PT-BR/page.md→ Matchespt-BRin config ✅_data/pt-br/→ Matchespt-BRin config ✅Backward Compatibility
✅ No breaking changes - existing sites with consistent casing work exactly as before
Related Issues
Resolves the case sensitivity issue for multi-component language codes like
pt-BR,zh-CN,en-US, etc.🤖 Generated with Claude Code