Skip to content

Converting from RTF (using CJK characters) into Markdown causes CJK characters messed up #9683

Closed
@kenjiuno

Description

@kenjiuno

Explain the problem.

Converting from RTF (using characters of Chinese, Japanese, and Korean languages) into Markdown causes CJK characters to be messed up.

Hello! English and CJK.rtf (input)

{\rtf1\ansi\ansicpg932\deff0\nouicompat\deflang1033\deflangfe1041{\fonttbl{\f0\fnil\fcharset128 Arial Unicode MS;}{\f1\fnil\fcharset129 Arial Unicode MS;}}
{\*\generator Riched20 10.0.19041}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\lang17 Hello! English and CJK\par
\u20320?\'8d\'44\'81\'49\par
\lang1041\'82\'b1\'82\'f1\'82\'c9\'82\'bf\'82\'cd\'81\'49\par
\f1\'be\'c8\'b3\'e7\'c7\'cf\'bc\'bc\'bf\'e4\f0\lang1033 !\lang17\par
}
 

Open this input with Windows Wordpad (write.exe "Hello! English and CJK.rtf").

2024-04-22_16h55_10

Command:

pandoc -o "Hello! English and CJK.md" "Hello! English and CJK.rtf"

Hello! English and CJK.md (output)

Actual

Hello! English and CJK

你D�I

‚±‚ñ‚É‚¿‚Í�I

¾È³çÇϼ¼¿ä!

Expected

Hello! English and CJK

你好!

こんにちは!

안녕하세요!

Pandoc version?

Pandoc is 3.1.13 which is installed with pandoc-3.1.13-windows-x86_64.msi.

pandoc 3.1.13
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: C:\Users\KU\AppData\Roaming\pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

Using Windows 10 Pro, Japanese edition.

Microsoft Windows [Version 10.0.19045.4291]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions