Skip to content

JsonEncodingStreamWrapper cannot handle UTF16 BOMs #80160

Open
@dougbu

Description

@dougbu

Description

If DataContractJsonSerializer is given a non-UTF8 Stream containing a byte order mark and not given a specific encoding, it will attempt auto-detection of the encoding. This eventually calls the code at https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.DataContractSerialization/src/System/Runtime/Serialization/Json/JsonEncodingStreamWrapper.cs#L474-L503 and incorrectly "detects" UTF8, causing later decoding issues.

Reproduction Steps

  1. dotnet new console --output DCSTest
  2. Add <PackageReference Include="System.Runtime.Serialization.Json" Version="4.3.0" /> to an item group in the project file
  3. Place the code below in Program.cs
  4. dotnet run
using System.Runtime.Serialization.Json;
using System.Text;

using var stream = new MemoryStream();
using var writer = new StreamWriter(stream, new UnicodeEncoding(bigEndian: false, byteOrderMark: true));
writer.WriteLine("{ \"AnInt\": 42 }");
writer.Flush();
stream.Position = 0;

var serializer = new DataContractJsonSerializer(typeof(Simple));
serializer.ReadObject(stream);

public class Simple
{
    int AnInt { get; set; }
}

Expected behavior

Successful execution of the project.

Actual behavior

Program throws:

Unhandled exception. System.Runtime.Serialization.SerializationException: There was an error deserializing the object of type Simple. Encountered unexpected character 'ÿ'.
 ---> System.Xml.XmlException: Encountered unexpected character 'ÿ'.
   at System.Xml.XmlExceptionHelper.ThrowXmlException(XmlDictionaryReader reader, XmlException exception)
   at System.Runtime.Serialization.Json.XmlJsonReader.ReadAttributes()
   at System.Runtime.Serialization.Json.XmlJsonReader.Read()
   at System.Xml.XmlBaseReader.IsStartElement()
   at System.Xml.XmlBaseReader.IsStartElement(XmlDictionaryString localName, XmlDictionaryString namespaceUri)
   at System.Runtime.Serialization.Json.DataContractJsonSerializer.InternalIsStartObject(XmlReaderDelegator reader)
   at System.Runtime.Serialization.Json.DataContractJsonSerializer.InternalReadObject(XmlReaderDelegator xmlReader, Boolean verifyObjectName)
   at System.Runtime.Serialization.XmlObjectSerializer.ReadObjectHandleExceptions(XmlReaderDelegator reader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
   --- End of inner exception stack trace ---
   at System.Runtime.Serialization.XmlObjectSerializer.ReadObjectHandleExceptions(XmlReaderDelegator reader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
   at Program.<Main>$(String[] args) in C:\dd\Projects\DCSTest\Program.cs:line 11

Regression?

Maybe. I haven't tested w/ older versions of the package.
[Edit - @StephenMolloy] Looks like this has been an issue since at least 4.8, so not a regression.

Known Workarounds

  1. Remove the byte order mark before passing anything to the serializer
  2. Use Encoding.GetTranscodedStream if targeting a recent-enough TFM (5.0 or later) instead of relying on the serializer to auto-detect
  3. Use JsonReaderWriterFactory if targeting pretty much anything other than pre-netstandard2.0 TFMs and specify the encoding explicitly

My core recommendation here is actually to remove JsonEncodingStreamWrapper and the XML EncodingStreamWrapper. Use TranscodingStream under the covers. And detect the encoding (when necessary) some other way, perhaps using something bulletproof like DetectEncoding() in StreamReader.

That recommendation relates to my need to use DCS for both JSON and XML in netstandard1.3 projects. In addition, this would remove unnecessary encoding restrictions, support non-UTF8 XML deserialization without the (silly?) XML declaration requirement, and simplify your code.

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions