Cache gRPC connections #1164

pierre-b · 2016-03-13T08:05:42Z

Hello guys,

I use PubSub to pulish thousands of messages and a memory leak crashes my server.

Here is a Passenger screenshot using gcloud:

Using request:

Here is a simple repo to reproduce it: https://github.com/pierre-b/test-leak

Environnement:

Ubuntu 14.04
node 4.4.0
Passenger 5.0.26
gcloud 0.29.0

Did I miss something?
thanks for your help

pierre-b · 2016-03-13T08:12:22Z

Forgot to mention the request interceptor does'nt work too in my test... if you can explain me how to implement it the right way ?!

pierre-b · 2016-03-14T01:31:21Z

the problem seems to come from the request package (see issue with https requests.

I added Wreck to my test-repo and Wreck is so far the best https client. I would recommend to switch to Wreck as long as Request has a leak...

jgeewax · 2016-03-14T13:24:17Z

Thanks for the note @pierre-b . @stephenplusplus , thoughts on switching our transport?

stephenplusplus · 2016-03-14T13:38:09Z

Unfortunately, wreck is only written for Node v4 and greater. We still need to support v0.12.0.

We switched Pub/Sub to use gRPC and proto files with our last release. Because that library is used, the request interceptors don't have an impact. I'll make a note of that in the docs.

When I'm trying to test with your repo, I'm running into:

{"statusCode":400,"error":"Bad Request","message":"Invalid cookie value"}

Is there a quick solution to get past that?

pierre-b · 2016-03-14T15:02:05Z

Thank you guys for investigating this!

@stephenplusplus maybe you need to clear your cookies? looks like HAPI tries to parse something on your localhost?!

stephenplusplus · 2016-03-14T17:13:05Z

Thanks, that did it. Can you show the passenger command you're running to see that output?

pierre-b · 2016-03-15T02:59:17Z

sudo watch passenger-status

stephenplusplus · 2016-03-15T18:30:16Z

In your demo, after downgrading to gcloud@0.27.0, the memory hike was negligible. I believe this can be traced to grpc, which was introduced as the Pub/Sub transport in version gcloud@0.28.0.

pierre-b · 2016-03-16T03:28:53Z

I confirm 0.27.0 is much better!
115M after 22 requests (each request publishes 100 messages in parallel)

pierre-b · 2016-03-16T03:50:45Z

Also, request@2.53.0 used in gcloud@0.27.0 consumes 27% less memory than request@2.69.0.

22 requests (each request publishes 100 messages in parallel):
2.53.0: 124M
2.69.0: 171M

stephenplusplus · 2016-03-16T16:08:49Z

@pierre-b would you mind opening an issue on the grpc repo about the high memory usage? I tried digging into it, but no doubt they'd be more efficient.

pierre-b · 2016-03-17T03:30:33Z

There is no open issue regarding a memory leak on the grpc nodejs repo, are you sure the problem comes from them and not the pubsub implementation?

As I never implemented the grpc node package I don't feel rightful to open an issue on their repo!

stephenplusplus · 2016-03-17T19:42:52Z

I completely understand :)

@murgatroid99 The test @pierre-b put together instantiates a service 100x, which hikes the memory usage up by about 100 MB. We don't retain a reference to the object, but it seems to persist in memory. Is this something that will be resolved once we pre-compile the proto files and use message objects (#1134 (comment))?

murgatroid99 · 2016-03-17T20:27:54Z

This usage pattern is not expected or intended to be performant. Each time you instantiate a service object, it creates a channel, which contains a number of things, including a full TCP socket/TLS session/HTTP2 session stack. This has to stay alive at least until the call you make with it is complete.

It would be better to initialize a single client object per server/service combination, and then use it for every call to that service on that server. This will allow gRPC to multiplex the calls on a single TCP connection, and will avoid having to deal with initial connection delay every single time you make a call.

stephenplusplus · 2016-03-17T20:46:45Z

Thanks, that gives me a lot to think about. I'm still wondering why the memory remains after the service calls have been made, however. Is there an extra step to stop the channel?

Also, regarding caching service connections, it will be difficult to know if a user is making a single call or many. Is there a default amount of time that passes that will close a channel? Would this caching be better handled within grpc?

murgatroid99 · 2016-03-17T21:10:22Z

To be specific, the sequence of events in that test looks something like this for a single call:

A channel is opened
A call is made on that channel
The channel goes out of scope
The channel gets garbage-collected
The call completes
The call goes out of scope
The call gets garbage collected

The channel can only be closed (and the associated memory released) after steps 4, 5, and 7 all complete, and two of those depend on when the Node garbage collector feels like running. Of course, it is possible that there is also an actual memory leak, but it may just look like a leak because the Node object wrapping the Channel gets collected before the actual channel goes away.

Regarding caching, I don't know the exact numbers, but I would guess that in general, it is better to cache every single channel than to create even two channels when one could have been used. As far as I know, there is currently no default time after which a channel is closed. In some sense, the caching is handled within gRPC: the intention is that the user creates a channel object once, uses it to make every call they need to make, and then disposes of it.

There has been some consideration of the idea of reusing connections for channels with the same target, but as far as I know, we don't actually have a plan or design for that yet.

pierre-b · 2016-03-18T09:03:10Z

@stephenplusplus the test /pierre-b/test-leak instantiates the service once (line 168) to send 100 requests.

@murgatroid99 yes the garbage collector released some memory (had to wait a while) but only the half... do you know why it's not clearing everything?

Thx

stephenplusplus · 2016-03-18T12:49:30Z

It's actually our own code that creates multiple service objects ("service" being a Protobuf term in this context) each time .publish() is called.

You can play around with forcing garbage collection to see if that clears up the other half.

stephenplusplus · 2016-03-18T19:24:15Z

In my experience, memory shoots up by about ~100 MB when 100 calls are made. I didn't observe the same release of memory after time that you did, @pierre-b. I tried forcing garbage collection up to 2 minutes after the response was sent. After multiple tests, I only see around 7 MB being freed up.

In some sense, the caching is handled within gRPC: the intention is that the user creates a channel object once, uses it to make every call they need to make, and then disposes of it.

In the case of this library, I think we'll always need it immediately closed, since we can't predict the next actions a user will take. That's why I was thinking of implementing something of a "timeout" so that if we don't use the channel within e.g., 120 seconds, the channel is then closed for us. If there was such a thing as a channel's maximum lifetime, that would save a lot of micro-management of open connections.

pierre-b · 2016-03-19T04:05:03Z

Yes that would be great! thanks for your test

stephenplusplus · 2016-03-24T13:58:02Z

@pierre-b I put a PR together with a quick caching implementation: #1182 -- feel free to try it out and let me know how it goes!

pierre-b · 2016-03-26T10:34:33Z

Thanks, will try it out ;)

stephenplusplus · 2016-03-28T15:01:36Z

Opened an issue on the gRPC library to track the memory leak: grpc/grpc#5970

pinazo · 2016-07-20T17:03:46Z

Hello guys,

I've just started using gcloud-node, 3 days ago. I am using Pub/Sub to publish a stream of packets that my server receives, converts to JSON and publish it.
All good until I deployed on the server and started receiving more traffic. The process increases memory usage until it crashes.

I see all related issues to this memory leak problem are closed, so let me know if you guys would like me to open a new issue. I've isolated the problem and the memory leak happens only when using cloud topic.publish a lot of times.

My environment is:

Linux Ubuntu 14.04
Node.js 5.6.0
gcloud 0.37.0

Below is the chart showing how the memory increases, from process.memoryUsage(). From 17:10 to 17:50 there is an execution measure in the aforementioned environment.

Please let me know if any further information is needed.

stephenplusplus · 2016-07-20T17:11:56Z

Thanks for the report. I believe this was resolved upstream in gRPC, although it hasn't been merged yet:

My repro repo: https://github.com/stephenplusplus/grpc-memory-leak
The issue on gRPC: Node: Memory leak grpc/grpc#7349
The PR with the fix on gRPC: Fix a memory leak in Node call credentials grpc/grpc#7394

It will probably take a while for the change to appear in this library. You should be able to install grpc from that PR's branch manually in the meantime:

$ npm install --save murgatroid99/grpc#node_client_creds_memory_leak

stephenplusplus · 2016-07-20T17:14:20Z

Sorry about the crashing :(

pinazo · 2016-07-20T17:18:23Z

Ok, thanks Stephen!

stephenplusplus changed the title ~~Huge memory leak~~ Cache gRPC connections Mar 24, 2016

stephenplusplus added enhancement core labels Mar 24, 2016

stephenplusplus mentioned this issue Mar 24, 2016

core: cache grpc services #1182

Merged

callmehiphop closed this as completed in #1182 Mar 28, 2016

stephenplusplus mentioned this issue Mar 28, 2016

[Node.js] Memory leak grpc/grpc#5970

Closed

stephenplusplus mentioned this issue Jun 2, 2016

Possible Memory Leak with gRPC #1356

Closed

JustinBeckwith assigned yoshi-automation Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache gRPC connections #1164

Cache gRPC connections #1164

pierre-b commented Mar 13, 2016

pierre-b commented Mar 13, 2016

pierre-b commented Mar 14, 2016

jgeewax commented Mar 14, 2016

stephenplusplus commented Mar 14, 2016

pierre-b commented Mar 14, 2016

stephenplusplus commented Mar 14, 2016

pierre-b commented Mar 15, 2016

stephenplusplus commented Mar 15, 2016

pierre-b commented Mar 16, 2016

pierre-b commented Mar 16, 2016

stephenplusplus commented Mar 16, 2016

pierre-b commented Mar 17, 2016

stephenplusplus commented Mar 17, 2016

murgatroid99 commented Mar 17, 2016

stephenplusplus commented Mar 17, 2016

murgatroid99 commented Mar 17, 2016

pierre-b commented Mar 18, 2016

stephenplusplus commented Mar 18, 2016

stephenplusplus commented Mar 18, 2016

pierre-b commented Mar 19, 2016

stephenplusplus commented Mar 24, 2016

pierre-b commented Mar 26, 2016

stephenplusplus commented Mar 28, 2016

pinazo commented Jul 20, 2016

stephenplusplus commented Jul 20, 2016

stephenplusplus commented Jul 20, 2016

pinazo commented Jul 20, 2016

Cache gRPC connections #1164

Cache gRPC connections #1164

Comments

pierre-b commented Mar 13, 2016

pierre-b commented Mar 13, 2016

pierre-b commented Mar 14, 2016

jgeewax commented Mar 14, 2016

stephenplusplus commented Mar 14, 2016

pierre-b commented Mar 14, 2016

stephenplusplus commented Mar 14, 2016

pierre-b commented Mar 15, 2016

stephenplusplus commented Mar 15, 2016

pierre-b commented Mar 16, 2016

pierre-b commented Mar 16, 2016

stephenplusplus commented Mar 16, 2016

pierre-b commented Mar 17, 2016

stephenplusplus commented Mar 17, 2016

murgatroid99 commented Mar 17, 2016

stephenplusplus commented Mar 17, 2016

murgatroid99 commented Mar 17, 2016

pierre-b commented Mar 18, 2016

stephenplusplus commented Mar 18, 2016

stephenplusplus commented Mar 18, 2016

pierre-b commented Mar 19, 2016

stephenplusplus commented Mar 24, 2016

pierre-b commented Mar 26, 2016

stephenplusplus commented Mar 28, 2016

pinazo commented Jul 20, 2016

stephenplusplus commented Jul 20, 2016

stephenplusplus commented Jul 20, 2016

pinazo commented Jul 20, 2016