-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use next sibling in SyntaxNode.GetChildPosition() if available #66876
Conversation
cc @CyrusNajmabadi for suggesting the approach. |
Do you ahve perf before/after? |
&& _children[(valueIndex - 2) >> 1].Value is null | ||
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { }); | ||
|
||
return GetChildPosition(index, useNextNotPrevious); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i legit do not understand any of this :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decide whether to look at previous siblings or following siblings in GetChildPosition()
by checking whether the nearest siblings are included in the _children
cache. If the nearest previous sibling is not cached, but the nearest following sibling is cached, we'll look at the following siblings; otherwise, we'll use the existing behavior of looking at the previous siblings.
To check for the nearest siblings though, we need to ignore separators, because separators are not represented in the cache.
{ | ||
int offset = 0; | ||
var green = this.Green; | ||
while (index > 0) | ||
|
||
if (!useNextNotPrevious) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: flip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used !useNextNotPrevious
to minimize the differences in the PR, and also because !useNextNotPrevious
is the common case.
} | ||
|
||
return this.Position + offset; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, i have no issue with the approach. but i think we def need tests around ensuring that reverse iterating produces nodes/tokens in the correct location. can you add a bunch of tests that show that if we reverse iterate the positions of things are all correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added verifying results from GetChildPosition()
and GetChildPositionFromEnd()
to tests in IOperation verification.
The perf of the C# and VB tests were similar.
|
for (int i = n - 1; i >= 0; i--) | ||
{ | ||
positions2[i] = separatedList.GetChildPositionFromEnd(i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and just to check. this actually computes, and is unaffected by anything potentially cached with the first for-loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
Actually, neither of the two loops cache any nodes, but both loops get the positions of the nodes using the cache is there at the time the verify method is called.
Add comments and also added verification without empty caches for both directions.
int valueIndex = (index & 1) != 0 ? index - 1 : index; | ||
if (valueIndex > 1 | ||
&& _children[(valueIndex - 2) >> 1].Value is null | ||
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preceding conditions have determined the previous sibling is not cached. This condition checks if the index represents the last item or the next to last item. If so, we use GetChildPositionFromEnd()
to calculate the position from the end of the list rather than the start. If the index is the last item (a separator) or next to last item (followed by a separator which are never cached), there are no following siblings to cache so we can skip the ||
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good capturing this in a comment and explaining why it doesn't matter whether the next item is cached in this case.
Can you add a test where the list has a trailing separator? |
&& _children[(valueIndex - 2) >> 1].Value is null | ||
&& (valueIndex >= Green.SlotCount - 2 || _children[(valueIndex + 2) >> 1].Value is { })) | ||
{ | ||
return GetChildPositionFromEnd(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the correlation between cache checks performed by GetChildPositionFromEnd
and cache checks performed here would be more obvious if we were using GetCachedSlot
helper rather than working with _children
on the low level. Also, could we instead adjust implementation of GetChildPosition
to perform the same cache tests for the following child? Then all the logic would be in one place, and, perhaps, we wouldn't need to provide a general GetChildPositionFromEnd
helper, since it looks like we really want to handle just 3 specific situations.
It looks like the optimization targets very limited set of scenarios. For example, when we are far from the end and the next sibling isn't cached, but the one after the next is cached, we won't take advantage of that cached sibling. Would it make sense to generalize the optimization. For example, even if nothing is cached, it still might be faster to calculate from end simply because the item is closer to the end than to the front. Similarly, we could make decision based on the proximity of the first cached item going backwards vs. going forward. Perhaps we could maintain a bitmap of cached items to speed-up the process of finding closest cached items. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the optimization targets very limited set of scenarios.
Yes, the optimization specifically targets the scenario where the first iteration through the child list is in reverse, so the previous siblings are not in the cache. That's the one scenario we have currently where there is a perf issue. The optimization here is simply to calculate the position based on the offset from the immediately following sibling when that sibling is in the cache; otherwise, we use the existing approach.
We should limit the change here to fixing this one scenario, to avoid perf regressions in other cases. If additional scenarios arise, we can consider further optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could we instead adjust implementation of
GetChildPosition
to perform the same cache tests for the following child?
GetChildPosition()
doesn't know about separated lists (where the separator is not in the cache) so it would be surprising for that method to skip over some siblings when determining whether using the following sibling is a better choice.
Then all the logic would be in one place, and, perhaps, we wouldn't need to provide a general
GetChildPositionFromEnd
helper ...
I added GetChildPositionFromEnd()
next to GetChildPostion()
so the two loops were together. And I think we'd still need the cases that are handled in GetChildPositionFromEnd()
, even if the methods were combined. I'll keep the two methods separate so that both implementations are clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the correlation between cache checks performed by
GetChildPositionFromEnd
and cache checks performed here would be more obvious if we were usingGetCachedSlot
helper rather than working with _children on the low level.
Updated.
Done with review pass (commit 10), haven't looked at tests yet |
internal int GetChildPositionFromEnd(int index) | ||
{ | ||
var green = this.Green; | ||
int offset = green.GetSlot(index)?.FullWidth ?? 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches the behavior of GetChildPosition()
above. It looks like GetChildPosition()
for a syntax list is typically called when creating the corresponding red node, which is then cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches the behavior of
GetChildPosition()
above.
I won't object to adjusting that one too.
|
||
[Theory] | ||
[CombinatorialData] | ||
public void EnumerateWithManyChildren_Forward(bool trailingSeparator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (commit 13)
@dotnet/roslyn-compiler, please review. |
@dotnet/roslyn-compiler for a second review, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach generally looks good to me, but I'm a bit concerned that the new tests won't catch performance regressions. Consider making the number of iterations more extreme (so that without this change, we'd definitely time out the CI build) or throw some kind of exception after a period of time in the test.
|
||
var green = this.Green; | ||
int offset = green.GetSlot(index)?.FullWidth ?? 0; | ||
int slotCount = green.SlotCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving the -1
to this expression.
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
When iterating a
SyntaxList.SeparatedWithManyChildren
collection in reverse, use the next sibling to calculate the position inSyntaxNode.GetChildPosition()
rather than relying on the previous sibling which may not be cached.Fixes #66475