Skip to content

Incorrect encoding when reading UTF-8 markdown files #138

@sheyifan

Description

@sheyifan

Bug: Incorrect encoding when reading UTF-8 markdown files on Windows

Description

When running md2cf on Windows, markdown files encoded in UTF-8 are read with the system default encoding (typically GBK/CP936 on Chinese Windows) instead of UTF-8. This causes Unicode characters (especially non-ASCII characters like Chinese, Japanese, etc.) to be incorrectly decoded and displayed as mojibake on Confluence pages.

Environment

  • OS: Windows (Chinese locale, default encoding: GBK/CP936)
  • Python version: 3.x
  • md2cf version: 2.3.0

Steps to Reproduce

  1. Create a UTF-8 encoded markdown file with non-ASCII characters (e.g., Chinese):

    # 需求
  2. Run md2cf to upload this file to Confluence:

    python -m md2cf --host <confluence-url> --username <user> --token <token> --space <space> file.md
  3. Check the page title on Confluence
    Expected Behavior
    The page title should display correctly as "需求" (Requirements in Chinese).

    Actual Behavior
    The page title displays as mojibake: "需求"

    This occurs because Python's open() function on Windows uses the system default encoding (GBK/CP936) when no encoding is specified, causing UTF-8 bytes to be incorrectly decoded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions