Skip to content

Fix case-insensitive language code matching#287

Open
rathboma wants to merge 1 commit intountra:mainfrom
rathboma:fix/case-insensitive-language-codes
Open

Fix case-insensitive language code matching#287
rathboma wants to merge 1 commit intountra:mainfrom
rathboma:fix/case-insensitive-language-codes

Conversation

@rathboma
Copy link
Contributor

Summary

Fixes case-sensitivity bug where documents with language codes in different cases (e.g., lang: pt-br) were not processed when config had different casing (e.g., languages: ['pt-BR']).

Problem

Users experienced:

  • Documents silently skipped during coordination
  • Files generated in _site/pt-BR/ but canonical URLs using /pt-br/
  • 404 errors when navigating
  • Broken hreflang alternate links

Solution

Implements a hybrid normalization approach:

  • Case-insensitive matching: All language code comparisons use lowercase
  • Preserved display: File paths and URLs use the original case from config
  • Early normalization: Document languages normalized in coordinate_documents
  • Single source of truth: Config languages array is canonical for display case

Changes

Core Implementation

  • lib/jekyll/polyglot/patches/jekyll/site.rb

    • Added @lang_norm_map and @languages_normalized for normalization
    • Added normalize_lang() and lang_exists?() helper methods
    • Updated fetch_languages() to create normalization mappings
    • Updated derive_lang_from_path() for case-insensitive matching
    • Updated coordinate_documents() to normalize early
    • Updated assignPageLanguagePermalinks() to use normalized keys
  • lib/jekyll/polyglot/hooks/coordinate.rb

    • Updated data directory merging for case-insensitive lookups
  • lib/jekyll/polyglot/liquid/tags/i18n_headers.rb

    • Added defensive normalization in hash building

Tests

  • Added 6 new unit tests for normalization helpers
  • Added integration test validating case mismatch scenario
  • Updated existing test to reflect case-insensitive behavior
  • All 44 tests pass with no regressions (77.92% coverage)

Testing

bundle exec rspec spec/jekyll/polyglot/patches/jekyll/site_spec.rb
# 44 examples, 0 failures

Examples

Now works with any case variant:

# _config.yml
languages: ['en', 'pt-BR', 'zh-CN']
---
lang: pt-br    # ✅ Works! (was broken before)
lang: PT-BR    # ✅ Works!
lang: Pt-Br    # ✅ Works!
---

Files with language in path also work:

  • pt-br/page.md → Matches pt-BR in config ✅
  • PT-BR/page.md → Matches pt-BR in config ✅
  • _data/pt-br/ → Matches pt-BR in config ✅

Backward Compatibility

✅ No breaking changes - existing sites with consistent casing work exactly as before

Related Issues

Resolves the case sensitivity issue for multi-component language codes like pt-BR, zh-CN, en-US, etc.

🤖 Generated with Claude Code

Fixes issue where documents with language codes in different cases
(e.g., lang: pt-br) were not processed when config had different
casing (e.g., languages: ['pt-BR']), causing:
- Documents silently skipped during coordination
- Files generated in _site/pt-BR/ but URLs using /pt-br/
- 404 errors and broken hreflang links

Changes:
- Add normalization infrastructure to Site class
- Normalize language codes early in coordinate_documents
- Use case-insensitive matching for all language comparisons
- Preserve original config case for file paths and URLs
- Support case-insensitive data directory lookups

All language code comparisons now use lowercase for matching while
preserving the original case from config for display and file paths.

Tests:
- Add 6 new unit tests for normalization helpers
- Add integration test validating case mismatch scenario
- Update existing test to reflect case-insensitive behavior
- All 44 tests pass with no regressions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
rathboma added a commit to rathboma/polyglot that referenced this pull request Jan 21, 2026
Merges PR untra#287 fix into combined-features branch.

This adds case-insensitive language code matching, allowing users to use
any case variant (pt-br, pt-BR, PT-BR) and have it work correctly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

# Conflicts:
#	lib/jekyll/polyglot/liquid/tags/i18n_headers.rb
#	lib/jekyll/polyglot/patches/jekyll/site.rb
#	spec/jekyll/polyglot/patches/jekyll/site_spec.rb
Comment on lines +28 to +33
# Create normalized lookup hash: lowercase -> original case
@lang_norm_map = {}
@languages.each { |lang| @lang_norm_map[lang.downcase] = lang }

# Store normalized versions for fast lookup
@languages_normalized = @languages.map(&:downcase)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is overkill ; theres being forgiving of miscased letters, and then theres making internal maps of different casings of the same languages. I don't like this approach with :lang_norm_map, :languages_normalized

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can see polyglot works by matching on language codes. there's being forgiving of case insensitivity, but when a system matches on specific (small) strings. there's such a thing as Too Much Magic ; the expectation of users is that they can match the required language codes as they defined them. Normalizing combinations of small strings into their cased shapes is wasteful and adds to the complexity of polyglot's pattern finding. Matching on every combination of casing for languages codes isn't implicitly more helpful, but a source of non-explicit software bugs.

For sure there is more room for making mistakes with the hyphenated language codes; but pt-BR or pt-br can be specified in the languages configuration in the yaml, and then that is what gets matched on. Polyglot uses regex heavily, and case sensitivity. Programmers need to be consistent with these keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants