-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix UcpRequests leak #11
base: development
Are you sure you want to change the base?
Fix UcpRequests leak #11
Conversation
I programmed hadroNIO with the philosophy in mind to not perform any memory allocations during request processing. The only memory allocations done when sending/receiving/selecting are performed by JUCX. In fact, you can see that JUCX does not scale very well with a lot of connections in my netty microbenchmark, because of this (Paper, E-Mail me if you cannot access it and are interested in the results). Your new solution will definetly allocate heap memory for managing the queue. Maybe there is a better way, like a fixed size array? |
Why
In this PR I fix OOM (Out of Memory) error, which occurs in test described here.
First of all, I reverted 2 previous PRs solving the same because that solution, involving explicit freeing of direct buffers, is error prone. In #8 I freed buffers, that previously was passed to callback, so callback had a broken reference to buffer. I apologize for this bug. In this PR I've found out why direct buffers was not collected by GC and fixed the root of the problem.
The JUCX library's JNI code creates global references for each
UcpRequest
and removes them when the request is processed, canceled, or (if the request is created by an endpoint) whencloseNonBlockingForce()
is called. hadroNIO creates manyUcpRequest
s inFillReceiveBuffer()
usingRecvTaggedNonBlocking
(which is not an endpoint method but a worker method) and only some of them are actually processed. This behavior causes GC to fail to collect request objects in the heap, which in turn reference large direct buffers. As a result, if the application is constantly creating new socket channels, an OOM error occurs.Heap Dump after OOM
How
To solve this problem I introduce
ConcurrentLinkedQueue
that tracksUcpRequests
created by worker. In close method ofHadronioSocketChannel
I implicitly cancel all remaining requests in queue. Also I suggest replacingendpoint.close()
(calls deprecateducp_ep_destroy
) withcloseNonBlockingForce()
to cancel all endpoint requests.VisualVM graph of direct buffer count. GC periodically collects buffers.
Possible refinements
SocketChannel
s in concurrent environment?RecvTaggedNonBlocking
) not being canceled after closing the worker could be solved on the level of UCX. There was already an attempt to introduce function to cancel all requests by tag, but the PR was closed "since the problem of additional incoming message" and it was suggested to use ActiveMessage API. Is it possible to use the ActiveMessage API in hadroNIO instead of the TaggedMessage API or would that be problematic?