Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error parse #1518

Closed
ChinaCCF opened this issue Mar 15, 2019 · 14 comments
Closed

error parse #1518

ChinaCCF opened this issue Mar 15, 2019 · 14 comments
Labels
kind: question platform: visual studio related to MSVC solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope)

Comments

@ChinaCCF
Copy link

{"Code":1,"Message":"用户登录成功","Data":{"Token":"146A307B82184AB5A00918CF9B612160","User":{"UserId":"3cd1cd06-3d77-4efb-a78d-ad9a9cea3d80","RealName":"admin","Department":"测试1"},"Settings":null},"PageIndex":null,"PageSize":null,"TotalCount":null,"TotalPageCount":null}

this json str can't parse through

@ChinaCCF
Copy link
Author

with gbk, chinese

@ChinaCCF
Copy link
Author

json throw JSON_THROW(static_cast<const detail::out_of_range>(&ex));

@nlohmann
Copy link
Owner

The JSON above is indeed valid and the following program works:

#include "json.hpp"
#include <iostream>

using json = nlohmann::json;

int main() {
    json j = R"({"Code":1,"Message":"用户登录成功","Data":{"Token":"146A307B82184AB5A00918CF9B612160","User":{"UserId":"3cd1cd06-3d77-4efb-a78d-ad9a9cea3d80","RealName":"admin","Department":"测试1"},"Settings":null},"PageIndex":null,"PageSize":null,"TotalCount":null,"TotalPageCount":null})"_json;
    
    std::cout << j.dump(2) << std::endl;
}

The library does only support UTF-8 though. Other encodings are not supported. The error may come from a misinterpretation of GBK.

@ChinaCCF
Copy link
Author

thank, but in my point, the gbk and utf8 or other codepage, the ascii part is same, the process of interpretation should not mistake. just like tinyxml2, it work well with gbk or utf8.

@nlohmann
Copy link
Owner

But your value does not only contain ASCII values. For this library, it makes a difference whether 用户登录成功 is encoded as UTF-8 or GBK.

@ChinaCCF
Copy link
Author

ChinaCCF commented Mar 15, 2019

00-7F 0xxx xxxx //ascii and gbk, utf8 same
07-FF 110x xxxx 10xx xxxx
...

what i want voice is that, if any char first bit is 1, you shoud process it as text, and do need to interpretate it, what we interest and need to interpretate is the char value below 128(unsinged)

@ChinaCCF
Copy link
Author

if a str is abc , so it is 0xxx xxxx, 0xxx xxxx, 0xxx xxxx
if b str is a中c, so it is 0xxx xxxx, 1xxx xxxx, 1xxx, xxxx, 0xxx xxxx, whether in gbk or utf8

@nlohmann
Copy link
Owner

Could you please attach the above JSON as file, so I can check myself?

@ChinaCCF
Copy link
Author

ok, wait half hours

@ChinaCCF
Copy link
Author

git@github.com:ChinaCCF/TestJSON.git

@ChinaCCF
Copy link
Author

the file commit here, and i recheck again, the same error

@nlohmann
Copy link
Owner

Thanks! Here is the error message I get:

libc++abi.dylib: terminating with uncaught exception of type nlohmann::detail::parse_error: [json.exception.parse_error.101] parse error at line 1, column 23: syntax error while parsing value - invalid string: ill-formed UTF-8 byte; last read: '"\323\303'

And indeed, the string 用户登录成功 begins at column 28. The parser complains about the byte sequence 0323 0303 (octal) which is 0xD3 0xC3 (hex). In UTF-8, D3 indicates the start of a 2-byte sequence. The next byte must be in the range 0x80..0xBF. Therefore, 0xC3 is unexpected here.

This library does not support any other encoding but UTF-8. The encoding of 用户登录成功 in GBK is invalid UTF-8, so it is rejected by the library.

@ChinaCCF
Copy link
Author

i like your lib very much, it's Intuitive , and simple. but if it do not support gbk, i feel sad, and use rapidjson(china tencent) insead of nlohmann. i think in the world (in win32) , there are many other program run in code page that is not utf8.

@nlohmann
Copy link
Owner

Sorry to hear that.

@nlohmann nlohmann added solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope) platform: visual studio related to MSVC labels Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: question platform: visual studio related to MSVC solution: wontfix the issue will not be fixed (either it is impossible or deemed out of scope)
Projects
None yet
Development

No branches or pull requests

2 participants