-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Improve performance on json to psobject conversion #7482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using pattern matching to simplify code. Removing some redundant initializations. nameof(input) instead of "input" in exception. Simplification of assignment Removing code redundancies
$json = @'
{
"id": "string",
"data": [
{
"accountId": 0,
"productId": 0,
"resourceLocationId": 0,
"consumedServiceId": 0,
"departmentId": 0,
"accountOwnerEmail": "string",
"accountName": "string",
"serviceAdministratorId": "string",
"subscriptionId": 0,
"subscriptionGuid": "string",
"subscriptionName": "string",
"date": "2017-04-27T23:01:43.799Z",
"product": "string",
"meterId": "string",
"meterCategory": "string",
"meterSubCategory": "string",
"meterRegion": "string",
"meterName": "string",
"consumedQuantity": 0,
"resourceRate": 0,
"Cost": 0,
"resourceLocation": "string",
"consumedService": "string",
"instanceId": "string",
"serviceInfo1": "string",
"serviceInfo2": "string",
"additionalInfo": "string",
"tags": "string",
"storeServiceIdentifier": "string",
"departmentName": "string",
"costCenter": "string",
"unitOfMeasure": "string",
"resourceGroup": "string"
}
],
"nextLink": "string"
}
'@
$json = 1..1000 | % { $json }
$json = '[' + ($json -join ',') + ']'
Measure-Command {$Json | ConvertFrom-Json } | % TotalMilliseconds Runs in 497.8867 miliseconds on the most recent build in this PR vs 1104.9008 ms in preview.4 and 318.1642ms in 5.1. Much better!! Thanks @powercode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Did you test that with a crossgen build? I know numbers! I have a lot of numbers. A lot of people don't know that, but I have the best numbers! Same code but with 20000 iterations - I get 27091 ms on 6.0.3 and 3635 ms on the PR. Curious about the difference. Start-PSBuild -Crossgen -Configuration release |
{ | ||
string errorMsg = string.Format(CultureInfo.InvariantCulture, | ||
WebCmdletStrings.DuplicateKeysInJsonString, entry.Key); | ||
var errorMsg = string.Format(CultureInfo.InvariantCulture, WebCmdletStrings.DuplicateKeysInJsonString, entry.Key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: CurrentCulture
should be used when constructing error message. A flaw in the original code.
} | ||
obj = returnHashTable ? | ||
PopulateHashTableFromJDictionary(dictionary, out error) : | ||
PopulateFromJDictionary(dictionary, memberTracker, out error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reaching here, we already know the size of the JObject
, so maybe we can replace memberTracker
with new DuplicateMemberSet(dictionary.Count)
? (just like what you did in other places)
Then DefaultMemberSetCapacity
won't be needed.
foreach (var entry in entries) | ||
{ | ||
if (string.IsNullOrEmpty(entry.Key)) | ||
{ | ||
string errorMsg = string.Format(CultureInfo.InvariantCulture, | ||
WebCmdletStrings.EmptyKeyInJsonString); | ||
var errorMsg = string.Format(CultureInfo.InvariantCulture, WebCmdletStrings.EmptyKeyInJsonString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: CurrentCulture
should be used when constructing error strings. It's a flaw in the original code.
{ | ||
string errorMsg = string.Format(CultureInfo.InvariantCulture, | ||
WebCmdletStrings.KeysWithDifferentCasingInJsonString, property.Name, entry.Key); | ||
var errorMsg = string.Format(CultureInfo.InvariantCulture, WebCmdletStrings.KeysWithDifferentCasingInJsonString, propertyName, entry.Key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: CurrentCulture
should be used when constructing error message.
@@ -255,8 +246,7 @@ private static Hashtable PopulateHashTableFromJDictionary(JObject entries, out E | |||
// does not throw when encountering duplicates and just uses the last entry. | |||
if (result.ContainsKey(entry.Key)) | |||
{ | |||
string errorMsg = string.Format(CultureInfo.InvariantCulture, | |||
WebCmdletStrings.DuplicateKeysInJsonString, entry.Key); | |||
string errorMsg = string.Format(CultureInfo.InvariantCulture, WebCmdletStrings.DuplicateKeysInJsonString, entry.Key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: same here
Good perf work @powercode! The changes look good to me in general. Left a few minor comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it! 🎉
I looked a bit at the difference in perf to Windows PowerShell.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the minor suggestion, LGTM.
} | ||
obj = returnHashTable ? | ||
PopulateHashTableFromJDictionary(dictionary, out error) : | ||
PopulateFromJDictionary(dictionary, new DuplicateMemberSet(dictionary.Count), out error); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: the way the if/else block is structured, it implies a path that is not covered. I suggest using if/else if/else and use the else block to provide an explicit comment.
if (object isJObject dictionary)
{}
else if (obj is JArray list)
{}
else
{ # nothing to do; return the obj as is.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving the return into the try statement so I get
if (object is JObject dictionary)
{...}
else if (obj is JArray list)
{///}
// if obj is a anything but JArray or JObject, we just return as is
return obj;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's clear anyway that we return obj as is for the other cases.
Or how about
switch (obj)
{
case JObject dictionary:
return returnHashTable
? PopulateHashTableFromJDictionary(dictionary, out error)
: PopulateFromJDictionary(dictionary, new DuplicateMemberSet(dictionary.Count), out error);
case JArray list:
return returnHashTable
? PopulateHashTableFromJArray(list, out error)
: PopulateFromJArray(list, out error);
default: return obj;
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes the alternative really clear.
} | ||
|
||
/// <summary> | ||
/// Convert a Json string back to an object of type PSObject or Hashtable depending on parameter <paramref name="returnHashTable"/>. | ||
/// Convert a Json string back to an object of type PSObject or HashTable depending on parameter <paramref name="returnHashTable"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class name is Hashtable
not HashTable
.
foreach (var entry in entries) | ||
{ | ||
if (string.IsNullOrEmpty(entry.Key)) | ||
{ | ||
string errorMsg = string.Format(CultureInfo.InvariantCulture, | ||
WebCmdletStrings.EmptyKeyInJsonString); | ||
var errorMsg = string.Format(CultureInfo.CurrentCulture, WebCmdletStrings.EmptyKeyInJsonString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use our help method StringUtil.Format
. Below too.
{ | ||
error = null; | ||
PSObject result = new PSObject(); | ||
var result = new PSObject(entries.Count); | ||
foreach (var entry in entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to replace the var
with explicit type. Please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is clearly stated by the ctor. Why repeat it?
Our coding guidelines has
var example = obj as Example;
as an example. I don't see how this is any different. There can be no confusion as to what type result has.
In general, I think it would be better to consider types as something the compiler uses to ensure we don't use something in an unintended way.
Personally, I find all the type annotations reducing readability, since they get in the way of the essence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, as background, I really recommend Herb Sutters https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when looking at high quality, modern, C# code bases, use of var
is very common, and in my eyes non-controversial.
https://github.com/dotnet/roslyn/blob/master/src/Compilers/Shared/BuildServerConnection.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we both say about foreach (var entry in entries)
? Seems you comment var result = new PSObject(entries.Count);
.
The foreach (var entry in entries)
is not your code so it is my minor comment with "Please". Feel free to ignore.
Prevously we discussed this - not all code readers use power IDEs with IntelliSense and it is better for readability to use explicit type until this type is not obvious from the name of a variable or local context. Compare: foreach (var entry in intEntries)
and foreach (var entry in entries)
.
Above the entries
variable is defined as JObject entries
- at first glance it is not at all clear what type of the entry
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the type of entry
is of interest here. It is some entity that has a Key
and a Value
. Should that type change, nothing in the code would need to change, as long as it has a Key
and a Value
.
That is captured nicely with the use of var
.
I think we should only specify the type explicitly when it is important for some reason that the value is of a certain type. That way, it calls attention to it - something interesting is happening here since we explicitly specified the type. It may be that a variable is declared as a base type instead of the actual type returned by a function.
The C# coding conventions on msdn has actually captured this:
Use implicit typing for local variables when the type of the variable is obvious from the right side of the assignment, or when the precise type is not important.
Use implicit typing to determine the type of the loop variable in for and foreach loops.
The use of var
and an explicit type is not just a matter of taste. They express different things. The explicit type should be used then the type matters, not as some security blanket.
var
should be used when we want to express that the type is not important here, as long as it walks like a duck and quacks like a duck.
It is a semantics change. Specifying a type may for example trigger implicit type conversion operators, or throw exceptions (in the foreach case).
{ | ||
JValue theValue = entry.Value as JValue; | ||
// Value | ||
JValue theValue = (JValue)entry.Value; | ||
result.Properties.Add(new PSNoteProperty(entry.Key, theValue.Value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, we discussed this earlier and concluded that it is safer to use as
in the place.
/cc @markekraus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with using casting when we are for sure expecting the cast-to type.
If the value does happen to be a different type, then it's a bug that needs to be fixed, and casting operator will make the bug to surface easily (while as
in this case might make the bug harder to be noticed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bug that needs to be fixed
@daxian-dbw If I remember right in previous discussion we concluded that we don't want an exception here. So I ping @markekraus to confirm or reject this.
Also we could consider the change as out of the PR and revert and open a tracking issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov I don't recall any reason why we need as
over casting here.
@@ -142,10 +124,10 @@ private static PSObject PopulateFromJDictionary(JObject entries, out ErrorRecord | |||
|
|||
// Case sensitive duplicates should normally not occur since JsonConvert.DeserializeObject | |||
// does not throw when encountering duplicates and just uses the last entry. | |||
if (result.Properties.Any(psPropertyInfo => psPropertyInfo.Name.Equals(entry.Key, StringComparison.InvariantCulture))) | |||
if (memberTracker.TryGetValue(entry.Key, out var maybePropertyName) | |||
&& string.Compare(entry.Key, maybePropertyName, StringComparison.CurrentCulture) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case if we have a duplicate key we return "null". So we should remove the string.Compare
at all.
/cc @markekraus What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We throw different errors in different cases. A duplicate key with same casing is a hard "NO", while a duplicate key with different casing is a workaround-able.
{ | ||
result.Add(((JValue)element).Value); | ||
// Value | ||
result[index] = ((JValue)element).Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same about as
.
/cc @markekraus
{ | ||
// Value | ||
JValue theValue = entry.Value as JValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same about as
. Here we don't use cast
- the pattern we approved previously.
/cc @markekraus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a bit inconsistent in this file.
{ | ||
result.Add(((JValue)element).Value); | ||
// Value | ||
result[index] = ((JValue)element).Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same about as
.
/cc @markekraus
@@ -24,45 +19,49 @@ namespace Microsoft.PowerShell.Commands | |||
[SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly")] | |||
public static class JsonObject | |||
{ | |||
private const int maxDepthAllowed = 100; | |||
private class DuplicateMemberSet : HashSet<string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuplicateMemberHashSet
looks more readable.
private const int maxDepthAllowed = 100; | ||
private class DuplicateMemberSet : HashSet<string> | ||
{ | ||
public DuplicateMemberSet(int capacity) : base(capacity, StringComparer.CurrentCultureIgnoreCase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect StringComparer.InvariantCultureIgnoreCase
. Why would we want to do culture sensetive compares (if we get Json string from Internet)?
This is consistent with the comment in the line 125.
/cc @markekraus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov Regarding HashTable: StyleCop complains that the word Hashtable
is spelled incorrectly, hence the change. I'm not religious about it, and can go either way. Also, the parameter is named returnHashTable
. Confusing. @dantraMSFT, @markekraus Any native English speaker care to weigh in here? Maybe change all references to Hashtable
casing? And maybe add hashtable as a dictionary work.
HashTable looks ugly to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct that CurrentCultureIgnoreCase is wrong. The previous code used the PSObject.Properties, and that uses OrdinalIgnoreCase, for it's member dictionary.
Not sure if we should use InvariantCultureIgnoreCase or OrdinalIgnoreCase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for OrdinalIgnoreCase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about OrdinalIgnoreCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@powercode I think if we reference Hashtable class in comments we should use quotes like 'Hashtable' to bypass StypeCop warnings. As for variable names we should follow Camel style so returnHashTable is ok.
Using OrdinalIgnoreCase for DuplicateMemberHashSet. Renaming DuplicateMemberSet -> DuplicateMemberHashSet Renaming returnHashTable -> returnHashtable Changing help to not use the HashTable casing.
Regarding the switch (entry.Value)
{
case JArray list:
{
// Array
var listResult = PopulateHashTableFromJArray(list, out error);
if (error != null)
{
return null;
}
result.Add(entry.Key, listResult);
break;
}
case JObject dic:
{
// Dictionary
var dicResult = PopulateHashTableFromJDictionary(dic, out error);
if (error != null)
{
return null;
}
result.Add(entry.Key, dicResult);
break;
}
case JValue value:
{
result.Add(entry.Key, value.Value);
break;
}
} Doesn't that clearly capture what we want to express? |
Good catch! I agree with using a type switch. |
…ent. Also specifying initial capacity on hashtable.
@iSazonov I'm getting code factor errors on switch (x)
{
case 1:
{
break;
}
case 2:
{
break;
}
} It seems to want it like switch (x)
{
case 1:
{
break;
}
case 2:
{
break;
}
} That feels strange. Do you know if it is intentional or a side effect? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@powercode I believe that spacing rule is intentional. It is easier to read code when there is some white space separating the end of one code block and the beginning of another. |
@markekraus Thanks for review! |
PR Summary
Changes to the
JsonObject
class to make conversions from json toPSObject
s faster.Three main changes:
Convert from JArray directly to object instead of creating a list and do list.ToArray().
Do not use PSObject to check for existing members, since that is a very slow and allocation heavy (sic!) code path.
PreAllocation the members in PSObject by using the newly added constructor accepting an initial member count.
This amounts to a speed-up in the order of 7x.
Cmdlets that benefit are
Convertfrom-Json
andInvoke-RestMethod
.PR Checklist
.h
,.cpp
,.cs
,.ps1
and.psm1
files have the correct copyright headerWIP:
to the beginning of the title and remove the prefix when the PR is ready.[feature]
if the change is significant or affects feature tests