Description
Description
If DataContractJsonSerializer
is given a non-UTF8 Stream
containing a byte order mark and not given a specific encoding, it will attempt auto-detection of the encoding. This eventually calls the code at https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.DataContractSerialization/src/System/Runtime/Serialization/Json/JsonEncodingStreamWrapper.cs#L474-L503 and incorrectly "detects" UTF8, causing later decoding issues.
Reproduction Steps
dotnet new console --output DCSTest
- Add
<PackageReference Include="System.Runtime.Serialization.Json" Version="4.3.0" />
to an item group in the project file - Place the code below in Program.cs
dotnet run
using System.Runtime.Serialization.Json;
using System.Text;
using var stream = new MemoryStream();
using var writer = new StreamWriter(stream, new UnicodeEncoding(bigEndian: false, byteOrderMark: true));
writer.WriteLine("{ \"AnInt\": 42 }");
writer.Flush();
stream.Position = 0;
var serializer = new DataContractJsonSerializer(typeof(Simple));
serializer.ReadObject(stream);
public class Simple
{
int AnInt { get; set; }
}
Expected behavior
Successful execution of the project.
Actual behavior
Program throw
s:
Unhandled exception. System.Runtime.Serialization.SerializationException: There was an error deserializing the object of type Simple. Encountered unexpected character 'ÿ'.
---> System.Xml.XmlException: Encountered unexpected character 'ÿ'.
at System.Xml.XmlExceptionHelper.ThrowXmlException(XmlDictionaryReader reader, XmlException exception)
at System.Runtime.Serialization.Json.XmlJsonReader.ReadAttributes()
at System.Runtime.Serialization.Json.XmlJsonReader.Read()
at System.Xml.XmlBaseReader.IsStartElement()
at System.Xml.XmlBaseReader.IsStartElement(XmlDictionaryString localName, XmlDictionaryString namespaceUri)
at System.Runtime.Serialization.Json.DataContractJsonSerializer.InternalIsStartObject(XmlReaderDelegator reader)
at System.Runtime.Serialization.Json.DataContractJsonSerializer.InternalReadObject(XmlReaderDelegator xmlReader, Boolean verifyObjectName)
at System.Runtime.Serialization.XmlObjectSerializer.ReadObjectHandleExceptions(XmlReaderDelegator reader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
--- End of inner exception stack trace ---
at System.Runtime.Serialization.XmlObjectSerializer.ReadObjectHandleExceptions(XmlReaderDelegator reader, Boolean verifyObjectName, DataContractResolver dataContractResolver)
at Program.<Main>$(String[] args) in C:\dd\Projects\DCSTest\Program.cs:line 11
Regression?
Maybe. I haven't tested w/ older versions of the package.
[Edit - @StephenMolloy] Looks like this has been an issue since at least 4.8, so not a regression.
Known Workarounds
- Remove the byte order mark before passing anything to the serializer
- Use
Encoding.GetTranscodedStream
if targeting a recent-enough TFM (5.0 or later) instead of relying on the serializer to auto-detect - Use
JsonReaderWriterFactory
if targeting pretty much anything other than pre-netstandard2.0
TFMs and specify the encoding explicitly
My core recommendation here is actually to remove JsonEncodingStreamWrapper
and the XML EncodingStreamWrapper
. Use TranscodingStream
under the covers. And detect the encoding (when necessary) some other way, perhaps using something bulletproof like DetectEncoding()
in StreamReader
.
That recommendation relates to my need to use DCS for both JSON and XML in netstandard1.3
projects. In addition, this would remove unnecessary encoding restrictions, support non-UTF8 XML deserialization without the (silly?) XML declaration requirement, and simplify your code.
Configuration
No response
Other information
No response