-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Bug: Incorrect encoding when reading UTF-8 markdown files on Windows
Description
When running md2cf on Windows, markdown files encoded in UTF-8 are read with the system default encoding (typically GBK/CP936 on Chinese Windows) instead of UTF-8. This causes Unicode characters (especially non-ASCII characters like Chinese, Japanese, etc.) to be incorrectly decoded and displayed as mojibake on Confluence pages.
Environment
- OS: Windows (Chinese locale, default encoding: GBK/CP936)
- Python version: 3.x
- md2cf version: 2.3.0
Steps to Reproduce
-
Create a UTF-8 encoded markdown file with non-ASCII characters (e.g., Chinese):
# 需求 -
Run md2cf to upload this file to Confluence:
python -m md2cf --host <confluence-url> --username <user> --token <token> --space <space> file.md
-
Check the page title on Confluence
Expected Behavior
The page title should display correctly as "需求" (Requirements in Chinese).Actual Behavior
The page title displays as mojibake: "需求"This occurs because Python's open() function on Windows uses the system default encoding (GBK/CP936) when no encoding is specified, causing UTF-8 bytes to be incorrectly decoded.