Skip to content

[Experiment] Reduce the number of syscalls #38747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

adamsitnik
Copy link
Member

Before I provide full benchmark results I want to get the status of tests from CI. Until then, please ignore this PR.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Jul 3, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

adamsitnik and others added 4 commits July 3, 2020 19:20
…e did not receive any new epoll notifications in the meantime (state==processing, _sequenceNumber++)
@adamsitnik adamsitnik added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jul 8, 2020
@adamsitnik adamsitnik changed the title [DRAFT] Reduce the number of syscalls [Experiment] Reduce the number of syscalls Jul 8, 2020
@adamsitnik
Copy link
Member Author

adamsitnik commented Jul 8, 2020

@geoffkizer I took your idea based on the following epoll man entry

       9.  Do I need to continuously read/write a file descriptor until
       EAGAIN when using the EPOLLET flag (edge-triggered behavior)?

       Receiving an event from epoll_wait(2) should suggest to you that
       such file descriptor is ready for the requested I/O operation.
       You must consider it ready until the next (nonblocking)
       read/write yields EAGAIN.  When and how you will use the file
       descriptor is entirely up to you.

       For packet/token-oriented files (e.g., datagram socket, terminal
       in canonical mode), the only way to detect the end of the
       read/write I/O space is to continue to read/write until EAGAIN.

       For stream-oriented files (e.g., pipe, FIFO, stream socket), the
       condition that the read/write I/O space is exhausted can also be
       detected by checking the amount of data read from / written to
       the target file descriptor.  For example, if you call read(2) by
       asking to read a certain amount of data and read(2) returns a
       lower number of bytes, you can be sure of having exhausted the
       read I/O space for the file descriptor.  The same is true when
       writing using write(2).  (Avoid this latter technique if you can‐
       not guarantee that the monitored file descriptor always refers to
       a stream-oriented file.)

and implemented a very simple version of it (see the files changed).

After that, I've run the benchmarks for hardcoded version of SDK, Runtime and ASP.NET and aspnet/benchmarks git commit:

    sdkVersion: 5.0.100-preview.8.20351.5
    runtimeVersion: 5.0.0-preview.8.20351.9
    aspNetCoreVersion: 5.0.0-preview.8.20352.8

Against a locally built System.Net.Sockets.dll with and without my changes (to make sure R2R does not interfere)

The results I got:

Machine Platform Benchmark before after ratio
Citrine 28 cores Plaintext 14,329,576 14,389,747 0.42%
  Json 1,205,056 1,269,529 5.35%
  Fortunes 429,395 431,154 0.41%
  Updates 16,344 16,654 1.90%
  SingleQuery 433,636 438,940 1.22%
  MultipleQueries 36,213 36,473 0.72%
         
Perf 12 cores Plaintext 7,260,411 7,528,454 3.69%
  Json 604,990 639,720 5.74%
  Fortunes 192,019 195,917 2.03%
  Updates 3,895 3,939 1.13%
  SingleQuery 214,440 214,503 0.03%
  MultipleQueries 6,214 6,017 -3.17%
         
AMD Plaintext 10,376,149 10,635,593 2.50%
  Json 733,663 722,975 -1.46%
  Fortunes 308,241 302,545 -1.85%
  Updates 12,113 11,702 -3.39%
  SingleQuery 319,858 322,232 0.74%
  MultipleQueries 20,818 20,708 -0.53%
         
ARM Plaintext 7,982,696 7,483,497 -6.25%
  Json 562,544 562,513 -0.01%
  Fortunes 80,236 73,321 -8.62%
  Updates 9,683 10,598 9.45%
  SingleQuery 68,404 68,585 0.26%
  MultipleQueries 12,945 13,992 8.09%

As you can see, the change improved the JSON benchmark throughput on Citrine and perf machines by almost +5%! It did not have such effect on AMD and ARM machines (my wild guess is that it's because they have more cores and we don't keep them 100% busy yet)

The problem is, that I was not able to get all of the tests passing. Initially, I've run into two simple problems:

  • I did not know that read() can return 0 as an indicator of "there is no more data to read". The fix was trivial: BytesTransferred > 0 => BytesTransferred >= 0
  • I was setting the state to QueueState.Waiting without checking if there was a new epoll notificaiton between the moment I received the previous one and finished sys call to wait. The fix was to capture the sequenceNumber in AsyncOperation and just in comparison.
epoll notification
start read
epoll notification
end read

After fixing these two simple issues I was able to get the failing test pass locally:

public async Task CopyToAsync_AllDataCopied(int byteCount)
{
await RunWithConnectedNetworkStreamsAsync(async (server, client) =>
{
var results = new MemoryStream();
byte[] dataToCopy = new byte[byteCount];
new Random().NextBytes(dataToCopy);
Task copyTask = client.CopyToAsync(results);
await server.WriteAsync(dataToCopy, 0, dataToCopy.Length);
server.Dispose();
await copyTask;
Assert.Equal(dataToCopy, results.ToArray());
});
}

but the CI remained red. I've written a small app that basically put this test into a while(true) and I was able to reproduce it after a while. I added some console logging and after analyzing the output, I got to the conclusion that this simple check is not always enough.

Based on the example of the test mentioned above. If we have pseudocode like this:

server.Send(1024 bytes);

Console.WriteLine(client.Receive());
Console.WriteLine(client.Receive());

Most of the time the execution is as following:

server.Send(1024 bytes);

// receives epoll notification that reading is possible

client.Receive(); // returns 1024, sets state to Waiting because we could have read more (buffer.Size > 1024)

// receives epoll notification that reading is possible

client.Receive(); // returns 0, sets state to Waiting

The problem is that we not always receive two epoll notifications and sometimes end up waiting for epoll_wait that is never finishing (verified with a debugger).

server.Send(1024 bytes);

// receives epoll notification that reading is possible

client.Receive(); // returns 1024, sets state to Waiting because we could have read more (buffer.Size > 1024)

client.Receive(); // never finishes, we are stuck waiting for a new notification from `epoll_wait`

If I remember correctly you have suggested using an epoll syscall instead of read in the case when state==waiting. I've not tried this approach because I was afraid that it would regress the performance of other benchmarks (I removed the syscall and the plaintext gain was within the margin of error, if I added a new one it would most probably take more).

Another idea was to switch from edge to level-triggered epoll. I've not tested that, mostly due to the fact that it's described as slower in most of the docs and also because I have to switch to some 5.0 tasks.

I am open to any suggestions and ideas, but for now, I have to switch to some other 5.0 tasks and I am going to close this PR.

It's hard to give up on this idea because with some other improvements that were recently merged, I am able to break the magic barrier of 1300k RPS for JSON ;(

FWIW I was testing that on Ubuntu 18.04.

/cc @tmds @stephentoub @benaadams @karelz

@geoffkizer
Copy link
Contributor

@adamsitnik Thanks for the investigation here. I'll look at your code and see if I see anything.

@geoffkizer
Copy link
Contributor

@adamsitnik What byteCount are you seeing this with? 1024?

@geoffkizer
Copy link
Contributor

Here's a guess at what might be happening:

If I understand your scenario correctly, what's happening is that the receiver has received all data, but hasn't received EOF yet, and is never getting an epoll notification that would trigger it to do another recv() and actually get the EOF.

I suspect we are not handling EPOLLHUP notifications properly. That is, in rare cases the receiver has received both the data and the EOF from the sender before the epoll notification is triggered. epoll_wait returns something like EPOLLIN | EPOLLHUP for this socket. We call recv() and get a partially filled buffer, so we assume there's no more data to be read, but we also assume there will be another epoll_wait notification -- which apparently in this case isn't true.

If we were to remember that we saw EPOLLHUP, then we would know that once we receive the partially filled buffer, then the client has sent EOF and no more data will be arriving.

Note that as things are implemented currently, we don't ever rely on EPOLLHUP, since we always try to do the recv() and we'll always get EOF from this. But with this partial buffer optimization, we won't try to recv() until we get another epoll notification, which in this case will never happen.

Thoughts?

@adamsitnik
Copy link
Member Author

What byteCount are you seeing this with? 1024?

Yes, 1024.

If I understand your scenario correctly, what's happening is that the receiver has received all data, but hasn't received EOF yet, and is never getting an epoll notification that would trigger it to do another recv() and actually get the EOF.

Yes, exactly.

Thoughts?

I am not familiar with the concept of EPOLLHUP, but I am going to read some docs and try to see if your suggestion would help .

@geoffkizer
Copy link
Contributor

Thanks, let me know what you find.

BTW there's some weirdness with EPOLLHUP events here that we probably need to change:

events = (events & ((uint32_t)~EPOLLHUP)) | EPOLLIN | EPOLLOUT;

@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants