Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use next sibling in SyntaxNode.GetChildPosition() if available #66876

Merged
merged 14 commits into from
Mar 8, 2023

Conversation

cston
Copy link
Member

@cston cston commented Feb 14, 2023

When iterating a SyntaxList.SeparatedWithManyChildren collection in reverse, use the next sibling to calculate the position in SyntaxNode.GetChildPosition() rather than relying on the previous sibling which may not be cached.

Fixes #66475

@cston
Copy link
Member Author

cston commented Feb 14, 2023

cc @CyrusNajmabadi for suggesting the approach.

@CyrusNajmabadi
Copy link
Member

Do you ahve perf before/after?

&& _children[(valueIndex - 2) >> 1].Value is null
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { });

return GetChildPosition(index, useNextNotPrevious);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i legit do not understand any of this :D

Copy link
Member Author

@cston cston Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decide whether to look at previous siblings or following siblings in GetChildPosition() by checking whether the nearest siblings are included in the _children cache. If the nearest previous sibling is not cached, but the nearest following sibling is cached, we'll look at the following siblings; otherwise, we'll use the existing behavior of looking at the previous siblings.

To check for the nearest siblings though, we need to ignore separators, because separators are not represented in the cache.

{
int offset = 0;
var green = this.Green;
while (index > 0)

if (!useNextNotPrevious)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: flip?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used !useNextNotPrevious to minimize the differences in the PR, and also because !useNextNotPrevious is the common case.

}

return this.Position + offset;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, i have no issue with the approach. but i think we def need tests around ensuring that reverse iterating produces nodes/tokens in the correct location. can you add a bunch of tests that show that if we reverse iterate the positions of things are all correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added verifying results from GetChildPosition() and GetChildPositionFromEnd() to tests in IOperation verification.

@cston
Copy link
Member Author

cston commented Feb 14, 2023

The perf of the C# and VB tests were similar.

EnumerateWithManyChildren_Forward: 400ms
EnumerateWithManyChildren_Reverse: before 4.1mins; after 340ms

@cston cston marked this pull request as ready for review February 15, 2023 21:47
@cston cston requested a review from a team as a code owner February 15, 2023 21:47
for (int i = n - 1; i >= 0; i--)
{
positions2[i] = separatedList.GetChildPositionFromEnd(i);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and just to check. this actually computes, and is unaffected by anything potentially cached with the first for-loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

Actually, neither of the two loops cache any nodes, but both loops get the positions of the nodes using the cache is there at the time the verify method is called.

Add comments and also added verification without empty caches for both directions.

int valueIndex = (index & 1) != 0 ? index - 1 : index;
if (valueIndex > 1
&& _children[(valueIndex - 2) >> 1].Value is null
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { }))
Copy link
Contributor

@AlekseyTs AlekseyTs Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueIndex >= Green.SlotCount - 2

Could you elaborate what is the meaning of this condition? #Closed

Copy link
Member Author

@cston cston Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preceding conditions have determined the previous sibling is not cached. This condition checks if the index represents the last item or the next to last item. If so, we use GetChildPositionFromEnd() to calculate the position from the end of the list rather than the start. If the index is the last item (a separator) or next to last item (followed by a separator which are never cached), there are no following siblings to cache so we can skip the || condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good capturing this in a comment and explaining why it doesn't matter whether the next item is cached in this case.

@CyrusNajmabadi
Copy link
Member

Can you add a test where the list has a trailing separator?

&& _children[(valueIndex - 2) >> 1].Value is null
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { }))
{
return GetChildPositionFromEnd(index);
Copy link
Contributor

@AlekseyTs AlekseyTs Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChildPositionFromEnd

It feels like the correlation between cache checks performed by GetChildPositionFromEnd and cache checks performed here would be more obvious if we were using GetCachedSlot helper rather than working with _children on the low level. Also, could we instead adjust implementation of GetChildPosition to perform the same cache tests for the following child? Then all the logic would be in one place, and, perhaps, we wouldn't need to provide a general GetChildPositionFromEnd helper, since it looks like we really want to handle just 3 specific situations.

It looks like the optimization targets very limited set of scenarios. For example, when we are far from the end and the next sibling isn't cached, but the one after the next is cached, we won't take advantage of that cached sibling. Would it make sense to generalize the optimization. For example, even if nothing is cached, it still might be faster to calculate from end simply because the item is closer to the end than to the front. Similarly, we could make decision based on the proximity of the first cached item going backwards vs. going forward. Perhaps we could maintain a bitmap of cached items to speed-up the process of finding closest cached items. #Closed

Copy link
Member Author

@cston cston Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the optimization targets very limited set of scenarios.

Yes, the optimization specifically targets the scenario where the first iteration through the child list is in reverse, so the previous siblings are not in the cache. That's the one scenario we have currently where there is a perf issue. The optimization here is simply to calculate the position based on the offset from the immediately following sibling when that sibling is in the cache; otherwise, we use the existing approach.

We should limit the change here to fixing this one scenario, to avoid perf regressions in other cases. If additional scenarios arise, we can consider further optimizations.

Copy link
Member Author

@cston cston Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could we instead adjust implementation of GetChildPosition to perform the same cache tests for the following child?

GetChildPosition() doesn't know about separated lists (where the separator is not in the cache) so it would be surprising for that method to skip over some siblings when determining whether using the following sibling is a better choice.

Then all the logic would be in one place, and, perhaps, we wouldn't need to provide a general GetChildPositionFromEnd helper ...

I added GetChildPositionFromEnd() next to GetChildPostion() so the two loops were together. And I think we'd still need the cases that are handled in GetChildPositionFromEnd(), even if the methods were combined. I'll keep the two methods separate so that both implementations are clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the correlation between cache checks performed by GetChildPositionFromEnd and cache checks performed here would be more obvious if we were using GetCachedSlot helper rather than working with _children on the low level.

Updated.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 10), haven't looked at tests yet

internal int GetChildPositionFromEnd(int index)
{
var green = this.Green;
int offset = green.GetSlot(index)?.FullWidth ?? 0;
Copy link
Contributor

@AlekseyTs AlekseyTs Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

green.GetSlot(index)

It looks like we completely ignore the fact that the item could be cached by now and, therefore, there is no need to iterate. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the behavior of GetChildPosition() above. It looks like GetChildPosition() for a syntax list is typically called when creating the corresponding red node, which is then cached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the behavior of GetChildPosition() above.

I won't object to adjusting that one too.


[Theory]
[CombinatorialData]
public void EnumerateWithManyChildren_Forward(bool trailingSeparator)
Copy link
Contributor

@AlekseyTs AlekseyTs Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumerateWithManyChildren_Forward

What would be the failure for these tests without the fix? Timeout? If so, consider adding a comment about that. #Resolved

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 13)

@cston cston requested a review from a team February 17, 2023 19:17
@cston
Copy link
Member Author

cston commented Feb 24, 2023

@dotnet/roslyn-compiler, please review.

@cston
Copy link
Member Author

cston commented Mar 6, 2023

@dotnet/roslyn-compiler for a second review, thanks.

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach generally looks good to me, but I'm a bit concerned that the new tests won't catch performance regressions. Consider making the number of iterations more extreme (so that without this change, we'd definitely time out the CI build) or throw some kind of exception after a period of time in the test.


var green = this.Green;
int offset = green.GetSlot(index)?.FullWidth ?? 0;
int slotCount = green.SlotCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving the -1 to this expression.

@cston
Copy link
Member Author

cston commented Mar 8, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@cston
Copy link
Member Author

cston commented Mar 8, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@cston cston merged commit 5b924e8 into dotnet:main Mar 8, 2023
@cston cston deleted the 66475 branch March 8, 2023 17:44
@ghost ghost added this to the Next milestone Mar 8, 2023
@allisonchou allisonchou modified the milestones: Next, 17.6 P3 Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

50x slower build with .net 6.0.403 compared to 6.0.402
5 participants