Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework MARSConnection state machine #933

Merged
merged 5 commits into from
Jul 23, 2021

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Feb 26, 2021

The MARSConnection class uses a confusing implicit state machine in HandleRecieveComplete to demux incoming packets and dispatch them to the correct MARSHandle instances. This PR reworks that state machine to be explicit allowing easier visualization of exactly what is happening as packets travel through it. It also allows a small optimization to be added which is where a packet header is decoded and the rest of the packet contents is exactly the payload it will be forwarded rather than reconstructed, this cuts down on copies allowing faster running.

Comment on lines +2523 to +2539
if (!IsPacketEmpty(readPacket))
{
ReleasePacket(readPacket);
}
Copy link
Contributor Author

@Wraith2 Wraith2 Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks redundant because of the check after this block of code for ManagedSNI before doing a packet release but it isn't. I tried putting just one release in place but any other combination causes a leak either in managed or native.

In Managed mode we have to release the packet only if there was a synchronous result.
In Native mode we have to release the packet if we received one.

Copy link
Contributor

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. It's definitely an improvement over the already-complex MARS logic. In order to get people comfortable with this change, can you list the tests that already cover the impacted functionality? Or maybe collect code coverage that shows all the changed code is covered? Also, are there any before/after perf changes?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 27, 2021

For coverage I'm relying on the standard tests. I rewrote and then ran all the functional and manual tests with forced managed mode. The first few times I found some problems and resolved them so it's certainly being used. I also had to track individual packet lifetimes as you can tell from the code I needed to add into Packet.Debug.cs and for that i used a version of the repro from 659 (which was where @cheenamalhotra noticed that there were packets being dropped to the GC) so I have run that for a while while exercises the single packet passthrough because they're small packets.
The main test coverage is going to be the set of tests that forces various packet sizes.

I haven't tried to benchmark this because the main purpose was correctness and code clarity but it's a good point, I should see if anything interesting has changed. We should gain some memory back from recycling and a tiny amount of cpu on single packet reads.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Feb 28, 2021

I tried to get some numbers and it looked like there might be a slight regression but I've found that there's also a very subtle locking issue as well that only seems to show under high load. I'm going to see what I can do to track it down but until I do please consider this too dangerous to merge.

@Wraith2 Wraith2 changed the title Rework MARSConnection state machine [NO MERGE] Rework MARSConnection state machine Feb 28, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 2, 2021

I've got a cleanly running version now which has changed the state machine to be more complicated. It has to do this to follow the exact and important locking scenario. We have to lock while demuxing but must not lock while dispatching to the lower handle, this was implicit in the code as written before and now I understand it I've put code checks in place as well as a few comments.

The SMUXHeader has been changed from a class into a struct because I needed to be able to take an allocation free clone of it. It's setup in the lock and must be used outside so a copy is required. There have been some renames and scope changes to make it two nested state switches.

I've been unable to find any noticeable performance difference between master and this PR but as I said earlier perf wasn't an objective. The perf improvements are long-term amortization of the packet arrays and in standard operations these happened enough to be noticeable under the debugger but not enough to cause an obvious degradation in throughput.

Some perf numbers from the DataAccessPerformance project which implements the techempower Fortunes benchmark which is a raw throghput measurement:

pr: ado-sqlclient async 16 Threads, tps: 20348.00, stddev(w/o best+worst): 1267.49
main: ado-sqlclient async 16 Threads, tps: 20415.00, stddev(w/o best+worst): 1225.99

Minor probably noise differences. I can't pick up anything interesting in microbenchmarks because they won't GC enough.

So the question is whether the code is easier to understand and maintain than the previous version. I can't say I'm happy about how complicated or how much new code there is. I would have liked to keep it a lot shorter. I do think that it has value for documenting the flow of data through the method and adds some important information about what locks need to be taken. Do the team think it's an improvement or adds enough value to be used?

@Wraith2 Wraith2 changed the title [NO MERGE] Rework MARSConnection state machine Rework MARSConnection state machine Mar 5, 2021
@Wraith2
Copy link
Contributor Author

Wraith2 commented Mar 8, 2021

Incidentally eiriktsarpalis posted a really good example of why I choose to use inline typeof(T) instead of put it into a variable dotnet/runtime#46414 (comment)

Base automatically changed from master to main March 15, 2021 17:54
@DavoudEshtehari DavoudEshtehari added the Area\Netcore Issues that are apply only to .NET runtime or the 'netcore' project folder. label Jun 25, 2021
@JRahnama
Copy link
Contributor

JRahnama commented Jun 25, 2021

Also, can you pull latest changes from main branch and make some changes and push them back to the pipelines to rerun the the tests? I could not do it from our side.

Copy link
Contributor

@DavoudEshtehari DavoudEshtehari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change you shows a well knowledge of the SMP. Please, include any additional information like diagram here, especially, using 'State' and 'DemuxState' enums as a history to help perceiving it in the future if you have something.

Here are my understanding(feel free to verify it or replace it with a better version):
image

@Wraith2
Copy link
Contributor Author

Wraith2 commented Jul 13, 2021

Here are my understanding(feel free to verify it or replace it with a better version):
image

I spent some time thinking through and trying to draw out a diagram and I couldn't produce anything better.
I think that the code has to be the final expression of what is going on. My hope was/is that there are enough comments and assertions to explain that is happening and guard against any harmful changes that might be introduced. I also hope that it's clear enough and fast enough that no-one needs to change or rewrite it any time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area\Netcore Issues that are apply only to .NET runtime or the 'netcore' project folder.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants