-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
I dug into the root cause of #32194 and I think I identified it; tl;dr, we ignore the return value of TrySkip for unknown (junk) properties; if the buffering happens to cut an unknown property in the middle, we end up in a bad state.
To reproduce this with SQL Server, go to JsonQueryAdHocSqlServerTest, and replace SeekJunkInJson with the following:
protected override void SeedJunkInJson(MyContextJunkInJson ctx)
=> ctx.Database.ExecuteSqlRaw(
"""
INSERT INTO [Entities] ([Collection], [CollectionWithCtor], [Reference], [ReferenceWithCtor], [Id])
VALUES(
N'[{{"JunkReference":{{"Something":"SomeValue" }},"Name":"c11","JunkProperty1":50,"Number":11.5,"JunkCollection1":[],"JunkCollection2":[{{"Foo":"junk value"}}],"NestedCollection":[{{"DoB":"2002-04-01T00:00:00","DummyProp":"Dummy value"}},{{"DoB":"2002-04-02T00:00:00","DummyReference":{{"Foo":5}}}}],"NestedReference":{{"DoB":"2002-03-01T00:00:00"}}}},{{"Name":"c12","Number":12.5,"NestedCollection":[{{"DoB":"2002-06-01T00:00:00"}},{{"DoB":"2002-06-02T00:00:00"}}],"NestedDummy":59,"NestedReference":{{"DoB":"2002-05-01T00:00:00"}}}}]',
N'[{{"MyBool":true,"Name":"c11 ctor","JunkReference":{{"Something":"SomeValue","JunkCollection":[{{"Foo":"junk value"}}]}},"NestedCollection":[{{"DoB":"2002-08-01T00:00:00"}},{{"DoB":"2002-08-02T00:00:00"}}],"NestedReference":{{"DoB":"2002-07-01T00:00:00"}}}},{{"MyBool":false,"Name":"c12 ctor","NestedCollection":[{{"DoB":"2002-10-01T00:00:00"}},{{"DoB":"2002-10-02T00:00:00"}}],"JunkCollection":[{{"Foo":"junk value"}}],"NestedReference":{{"DoB":"2002-09-01T00:00:00"}}}}]',
N'{{"Name": "r1", "Number": 1.5, "JunkReference":{{"Something": "SomeValue" }}, "JunkCollection": [{{"Foo": "junk value"}}], "NestedReference": {{"DoB": "2000-01-01T00:00:00"}}, "NestedCollection": [{{"DoB": "2000-02-01T00:00:00", "JunkReference": {{"Something": "SomeValue"}}}}, {{"DoB": "2000-02-02T00:00:00"}}]}}',
N'{{"MyBool":true,"JunkCollection":[{{"Foo":"junk value"}}],"Name":"r1 ctor","JunkReference":{{"Something":"SomeValue" }},"NestedCollection":[{{"DoB":"2001-02-01T00:00:00"}},{{"DoB":"2001-02-02T00:00:00"}}],"NestedReference":{{"JunkCollection":[{{"Foo":"junk value"}}],"DoB":"2001-01-01T00:00:00"}}}}',
1)
""");
The difference is in the 3rd row being inserted: the ordering is different and spaces have been added; this matches exactly the output that comes back on PostgreSQL, where the JSON data is saved in jsonb (not as text), and therefore a different representation gets returned. On this change is made, running the test Junk_in_json_basic_no_tracking triggers #32194, causing "malformed JSON" to be processed.
When the test reaches MaterializeJsonEntityCollection, the JsonReaderData contains the following: {"DoB": "2000-02-01T00:00:00", "JunkReference": {"Something": "Som
. Note that the JunkReference data is cut off because of buffering; this is crucial to reproducing the bug (in the original SQL Server data, the entire thing is buffered and the bug does not occur).
Looking at the actual shaper, we have the following code for parsing out the MyJsonEntityJunkInJsonNested in the collection:
tokenType = jsonReaderManager.CurrentReader.TokenType;
Loop(Break: done Continue: )
{
{
tokenType = jsonReaderManager.MoveNext();
switch (tokenType)
{
case PropertyName:
jsonReaderManager.CurrentReader.ValueTextEquals(DoB.EncodedUtf8Bytes) ?
{
jsonReaderManager.MoveNext();
namelessParameter{2} = ExpressionExtensions.ValueBufferTryReadValue<DateTime>(
valueBuffer: materializationContext3.get_ValueBuffer(),
index: 3,
property: Property: MyEntityJunkInJson.Collection#MyJsonEntityJunkInJson.NestedCollection#MyJsonEntityJunkInJsonNested.DoB (DateTime) Required);
} : default(void)
case EndObject:
Goto(break done)
default:
{
jsonReaderManager.CurrentReader.TrySkip();
}
}
}}
When get a property, if it's the expected DoB
property it gets assigned; otherwise, if it's an unknown property, we simply do nothing (this is the case with JunkReference). At this point we continue to the next token, which is a StartObject, and go into the default case. We call TrySkip() but we ignore its return value; if the value isn't fully buffered (as in this case), the skip silently fails.
We continue looping - through the JunkReference's internal tokens, until we get to the EndObject of the inner JunkReference and return - but we haven't consumed the EndObject of the outer object itself (MyJsonEntityJunkInJsonNested). So back in MaterializeJsonEntityCollection, we do MoveNext and get the containing object's EndObject, which for our logic is malformed JSON, and we fail.
/cc @maumar @ajcvickers