-
Notifications
You must be signed in to change notification settings - Fork 686
Description
Problem description
Hey,
We've been investigating DEADLINE_EXCEEDED errors that our clients are facing at Momento.
Our clients periodically receive this and we have done a few things that has helped reduce the volume of the deadline errors drastically. Most of these improvements had to do around keepalive settings and/or tweaking other connection settings. We've been mostly enabling gRPC traces with subchannel, dns_resolver, resolving_load_balancer, keepalive traces which have been very helpful.
Our clients still see a burst of these errors, lasting for up-to a minute, and then resolving itself. We have been gathering some event loop/CPU metrics from the clients to correlate any resource contention on the nodeJS process. Meanwhile, we are curious if there are other ways to debug the errors apart from the methods I mentioned above. Now we are at a point where we do not see those gRPC trace logs around the requests indicating that no reconnections took place. Since these logs are verbose, we do not want clients to enable all for the traces as it impacts their monitoring bills.
This ticket is primarily for stack traces (similar to this) that I believe can help us figure out at what point the deadline happened. Going via a ticket on this Java SDK, it seems like we can answer a few things based on the stack trace. Currently the stack traces stop at the onReceiveStatus method of the underlying gRPC library. I'd expect some stack traces from the request lifecycle when the call actually failed.
We'd love to hear any other ideas you may have for us to debug client-side timeouts/deadline exceeded errors better!
Reproduction steps
- Set a very low timeout value on any gRPC backed service and see the stack trace always looking similar (I'd expect more traces from
grpc-js):
Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
at callErrorFromStatus (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/call.ts:82:17)
at Object.onReceiveStatus (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/client.ts:360:55)
at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/call-interface.ts:149:27
at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/grpc/middlewares-interceptor.ts:105:40
at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/grpc/middlewares-interceptor.ts:158:19
at processTicksAndRejections (node:internal/process/task_queues:95:5)
for call at
at ScsClient.makeUnaryRequest (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/client.ts:325:42)
at ScsClient.Set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
at ScsClient.Set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@gomomento/generated-types/dist/cacheclient.js:12394:30)
at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:354:38
at new Promise (<anonymous>)
at CacheDataClient.sendSet (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:353:18)
at CacheDataClient.set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:338:23)
at CacheClient.set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/core/src/internal/clients/cache/AbstractCacheClient.ts:200:25)
at Object.<anonymous> (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/test/integration/cache-client-close.test.ts:18:20)
Environment
- OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64]: Linux/MacOS
- Node version [e.g. 8.10.0]: 20
- Package name and version [e.g. gRPC@1.12.0]: gRPC-js@1.10.0