CJK Support Added for Markdown Slug Generation #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CJK Support Added for Markdown Slug Generation
Summary
This PR adds support for CJK (Chinese, Japanese, Korean) characters in the markdown file slug generation, allowing non-Latin characters to be preserved in generated filenames and anchors when using the
explodecommand.Related Isssues
Closes #5
Files Changed
bin/md-tree.jsModified the
sanitizeTextmethod to preserve CJK characters instead of stripping them out. The regex now includes Unicode ranges for Chinese, Japanese (Hiragana and Katakana), and Korean characters.package.jsonBumped the version from 1.5.1 to 1.6.0 to reflect the new feature addition.
test/test-cjk.mdAdded a new test markdown file containing CJK characters in headings to verify the functionality works correctly.
test/test-cli.jsAdded a comprehensive test case to verify that the
explodecommand correctly handles CJK characters in headings, generates appropriate filenames, and creates proper links.Code Changes
bin/md-tree.jsThe regex now includes Unicode ranges:
\u4e00-\u9fff: Chinese characters\u3040-\u309f: Japanese Hiragana\u30a0-\u30ff: Japanese Katakana\uac00-\ud7af: Korean Hangultest/test-cli.jsReason for Changes
Previously, the markdown tree parser would strip out all non-Latin characters when generating slugs for filenames and anchors. This made the tool unsuitable for documentation written in CJK languages, as headings like "章节一" would be converted to empty strings or hyphens only, resulting in non-descriptive or conflicting filenames.
Impact of Changes
Test Plan
A comprehensive test case has been added that:
explodecommand on this fileThe test covers Chinese characters in main headings and Japanese characters in subsections, alongside regular English headings to ensure mixed-language documents work correctly.
Additional Notes