-
Couldn't load subscription status.
- Fork 174
Considerations about strings
J. Zebedee edited this page Oct 5, 2015
·
1 revision
This document describes the implementation of MsgPack-CLI design and implementation for it.
The de-facto standard interpretation of MessagePack specification is that a Unicode string should be encoded as UTF-8 without BOM and stored on Raw type.
MsgPack-CLI is implemented as following:
-
PackerpacksString(or Char sequence) as UTF-8 bytes on Raw type. Note thatPackerprovides overloaded methods which acceptsSystem.Text.Encodingto specify custom character encoding. -
UnpackerandMessagePackObjecthandles Raw type value asbyte[], and they provideReadStringorAsStringmethods which handle character decoding from unpacked Raw type value. -
MessagePackSerializer<T>uses above primitive APIs as following rules: - If target field or property is String type, then UTF-8 encoding will be used. If deserializing stream contains invalid byte sequence as UTF-8, an exception is thrown.
- If target field or property is
Byte[]type, then raw bytes will be stored as is. - If you want to handle other encoding like Latin-1 string, Shift-JIS string etc., you must build custom serializer by hand.