Do network connections and writes in KafkaClient.poll() #1729

dpkp · 2019-03-06T15:09:24Z

This PR completes #981 and attempts to address several reports of consumer lockups that appear related to KafkaClient blocking for up to the full request_timeout_ms while holding the client lock, while all other threads are prevented from initiating new network requests or connections until the lock is released.

There are 4 commits here. The first is a simple refactor of BrokerConnection that separates queueing of a new network request (via .send / ._send) and performing the actual network IO (via .send_pending_requests).

The second updates KafkaClient to only performing network IO requests during .poll(). It uses the wakeup channel to signal between threads, allowing a sender to wakeup a blocked poller and trigger an immediate call to .send_pending_requests().

The third commit address network connection management, separate from network writes. It updates KafkaClient.send() to only acquire the client lock when a new connection is needed.

And the fourth commit completes the transition by moving all connection attempts via _maybe_connect into KafkaClient.poll(), which should eliminate the thread contention between a thread that is polling and some other thread that wants to initiate network IO.

This change is

jeffwidman

Looks good.

jeffwidman · 2019-03-07T23:48:15Z

kafka/conn.py

+        if blocking:
+            error = self.send_pending_requests()
+            if isinstance(error, Exception):
+                future.failure(error)


Perhaps add a debug-level log statement here of this error?
It looks like send_pending_requests() already logs most (but not all) errors, so this may be superfluous, but my one thought is that if someone files a ticket, we have a little more visibility/guarantees about the errors they're hitting...

jeffwidman · 2019-03-07T23:52:13Z

kafka/conn.py

+                future.failure(error)
+                return future
+
+        log.debug('%s Request %d: %s', self, correlation_id, request)


Should this log line be located above the if blocking: line? Since the info seems useful regardless of whether blocking.

In the current implementation we only log this after the request is sent successfully. So I put this after the blocking section to keep it consistent.

jeffwidman · 2019-03-07T23:53:16Z

kafka/conn.py

        except ConnectionError as e:
-            log.exception("Error sending %s to %s", request, self)
+            log.exception("Error sending request data to %s", self)


Is there a reason to stop logging the request value?

with this design we only have the encoded bytes at this stage -- we no longer have the original request object. so for that reason i took it out of the log message. This error should be sent down to the future and we can expect that the error handler for the request future will be responsible for logging the details.

jeffwidman · 2019-03-08T00:17:18Z

kafka/client_async.py

-        """Send a request to a specific node.
+        """Send a request to a specific node. Bytes are placed on an
+        internal per-connection send-queue. Actual network I/O will be
+        triggered in a subsequent call to .poll()


Does this create any new scenarios where we now need to make sure to call poll()?
In other words, will this break any behavior that used to work fine w/o ever calling poll()?

Currently, I can't think of any--looks like metadata refresh picks this up automatically since it relies on maybe_connect and you also updated the fetcher to always call poll(), just wondering if there might be any others...

It is possible, yes, but they should be rare. The main culprits would be blocking loops attempting to connect without calling poll(), or not calling poll() unless there are in-flight-requests. I think I've found and fixed those issues, but definitely keep an eye open for others.

…ends

…lient.send()

jeffwidman · 2019-03-08T00:54:53Z

I rebased to pickup the latest changes on master, in particular the locking change... I doubt there's any issues, but just in case. It was also for convenience so I could quickly pip install the tip of this on a production box (where my tooling is limited) w/o having to deal with creating my own separate fork etc.

dpkp · 2019-03-08T16:01:01Z

I'm going to be out of town / offline for the next week or so. Tests are passing, and I'm satisfied w/ where this is at for now, so I'm going to merge to master. Feel free to post more feedback here that you collect from real-world usage.

jeffwidman · 2019-03-08T16:21:02Z

Sounds good, thanks for all the hard work here.

dpkp requested review from jeffwidman and tvoinarovskyi March 6, 2019 15:09

dpkp force-pushed the async_connect_send branch from 0edbaf9 to 59d6d6a Compare March 6, 2019 15:56

jeffwidman approved these changes Mar 8, 2019

View reviewed changes

dpkp added 6 commits March 7, 2019 16:52

Add BrokerConnection.send_pending_requests to support async network s…

3c87379

…ends

Send network requests during KafkaClient.poll() rather than in KafkaC…

492b7d2

…lient.send()

Dont acquire lock during KafkaClient.send if node is connected / ready

2313286

Move all network connection IO into KafkaClient.poll()

957c62d

Fetcher should call client.poll() regardless of in-flight-request count

7ce6976

Remove sleep check when no partitions assigned -- no longer needed

0a8914f

jeffwidman force-pushed the async_connect_send branch from 3fd7d09 to 0a8914f Compare March 8, 2019 00:52

dpkp merged commit 8c07925 into master Mar 8, 2019

dpkp deleted the async_connect_send branch March 8, 2019 16:01

jeffwidman mentioned this pull request Mar 21, 2019

Correlation IDs do not match [1.4.5] #1748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do network connections and writes in KafkaClient.poll() #1729

Do network connections and writes in KafkaClient.poll() #1729

dpkp commented Mar 6, 2019 •

edited

Loading

jeffwidman left a comment •

edited

Loading

jeffwidman Mar 7, 2019

jeffwidman Mar 7, 2019

dpkp Mar 8, 2019

jeffwidman Mar 7, 2019

dpkp Mar 8, 2019

jeffwidman Mar 8, 2019 •

edited

Loading

dpkp Mar 8, 2019

jeffwidman commented Mar 8, 2019

dpkp commented Mar 8, 2019

jeffwidman commented Mar 8, 2019

Do network connections and writes in KafkaClient.poll() #1729

Do network connections and writes in KafkaClient.poll() #1729

Conversation

dpkp commented Mar 6, 2019 • edited Loading

jeffwidman left a comment • edited Loading

Choose a reason for hiding this comment

jeffwidman Mar 7, 2019

Choose a reason for hiding this comment

jeffwidman Mar 7, 2019

Choose a reason for hiding this comment

dpkp Mar 8, 2019

Choose a reason for hiding this comment

jeffwidman Mar 7, 2019

Choose a reason for hiding this comment

dpkp Mar 8, 2019

Choose a reason for hiding this comment

jeffwidman Mar 8, 2019 • edited Loading

Choose a reason for hiding this comment

dpkp Mar 8, 2019

Choose a reason for hiding this comment

jeffwidman commented Mar 8, 2019

dpkp commented Mar 8, 2019

jeffwidman commented Mar 8, 2019

dpkp commented Mar 6, 2019 •

edited

Loading

jeffwidman left a comment •

edited

Loading

jeffwidman Mar 8, 2019 •

edited

Loading