Fix for RecreateSyncProducerPoolForMetadata #34

maximecaron · 2016-03-17T17:51:09Z

RecreateSyncProducerPoolForMetadata was never called in RefreshMetadata() if RefreshMetadataInternal throw exceptions.
Thus the consumer get stuck with an invalid list of broker unless KafkaSimpleManager is recreated completely.

If ProducerPoolForMetadata get stale or invalid call to RefreshMetadataInternal will throw Kafka.Client.Exceptions.UnableToConnectToHostException. Calling RecreateSyncProducerPoolForMetadata will query zookeeper and recreate the pool that is passed to BrokerPartitionInfo by RefreshMetadataInternal allowing for next retry to succeed. at Kafka.Client.KafkaConnection.Connect() at Kafka.Client.KafkaConnection.Send(Byte[] data) at Kafka.Client.KafkaConnection.Handle[T](Byte[] data, IResponseParser`1 parser, Boolean shouldParse) at Kafka.Client.KafkaConnection.Send(TopicMetadataRequest request) at Kafka.Client.Producers.Partitioning.BrokerPartitionInfo.UpdateInfo(Int16 versionId, Int32 correlationId, String clientId, String topic) at Kafka.Client.Helper.KafkaSimpleManager`2.RefreshMetadataInternal(Int16 versionId, String clientId, Int32 correlationId, String topic, Dictionary`2 tempTopicMetadatas, Dictionary`2 tempTopicMetadatasLastUpdateTime, Dictionary`2 partitionLeaders) at Kafka.Client.Helper.KafkaSimpleManager`2.RefreshMetadata(Int16 versionId, String clientId, Int32 correlationId, String topic, Boolean force)

mhorstma · 2016-04-06T18:00:53Z

src/KafkaNET.Library/Helper/KafkaSimpleManager.cs

                {
-                    Logger.WarnFormat("Got null for metadata of topic {0}, will RecreateSyncProducerPoolForMetadata and retry . ", topic);
+                    Logger.WarnFormat("Got exception while refreshing metadata of topic {0}, will RecreateSyncProducerPoolForMetadata and retry . {1} ",topic,
+                     ExceptionUtil.GetExceptionDetailInfo(ex));
                    RecreateSyncProducerPoolForMetadata();


Only concern is that RecreateSyncProducerPoolForMetadata() can throw. I hate to see new exceptions thrown from inside a catch block. Could this be restructured to move this call be into its own try/catch or just extracted from the current try/catch altogether and let it throw if its going to? If it does, it probably indicates a more severe error where a retry may not help.

I dont think RecreateSyncProducerPoolForMetadata() should be extracted/moved outside the catch block as it should not need to be called each time RefreshMetadata() is called but only if RefreshMetadata() fail.

If what you mean is something such as setting a bool flag inside the catch block to indicate a failure and then call RecreateSyncProducerPoolForMetadata() outside of the catch block if the flag was set, this could be done and would have the same intended result. but I am not sure I understand how this will make it better.

We could also add an extra try/catch around RecreateSyncProducerPoolForMetadata(); to honor the retryCount if you think this is better, but the retry loop was not really needed to start with since the caller of RefreshMetadata() could simply retry calling RefreshMetadata() if an exception is thrown.

There are two issues here, really. One is that the stack trace for any exceptions thrown from within the catch block will be convoluted (not the end of the world), and two is that this code is using the exception for flow control. IF the code in the try block throws an exception, THEN execute RecreateSyncProducerPoolForMetadata(). That part makes me most uncomfortable. Could you tighten up the try block to only wrap RefreshMetadataInternal, the problem method, then set a bool in the catch block and if the bool is true, call Recreate...() lower down? That way any exceptions are handled and execution moves on to an explicit flow control block.

Sorry to push on this. This is a needed fix, but this lib is already pretty liberal with exceptions and being more disciplined about how they are used in new code seems like a good thing.

sure wrapping only RefreshMetadataInternal and setting bool sounds good to me.

Each retry is doing a network call. The caller of RefreshMetadata should take care of retry using Exponential Backoff and Jitter. Blindly retrying in a loop is not likely to work and will often make things worst because of thundering herd effect on Zookeeper and Kafka severs.

robinson · 2016-10-21T13:27:48Z

Cool, from when Microsoft use the Pull Request Bot? In the near future, the bot probably fix the bug and full request for us! 👍

maximecaron added 3 commits March 16, 2016 17:53

add continue; to not increment retry twice

dc9c4d1

Fix build error

9b3ad1f

msftclas added the cla-not-required label Mar 17, 2016

maximecaron mentioned this pull request Mar 21, 2016

In BrokerPartitionInfo.cs list of brokers is never refreshed from Zookeeper #32

Open

mhorstma reviewed Apr 6, 2016
View reviewed changes

maximecaron added 2 commits May 18, 2016 10:47

Update KafkaSimpleManager.cs

e44b973

msftgits removed the cla-not-required label Sep 26, 2017

microsoft deleted a comment from msftclas Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for RecreateSyncProducerPoolForMetadata #34

Fix for RecreateSyncProducerPoolForMetadata #34

Uh oh!

maximecaron commented Mar 17, 2016

Uh oh!

mhorstma Apr 6, 2016

Uh oh!

maximecaron Apr 6, 2016

Uh oh!

mhorstma Apr 6, 2016

Uh oh!

maximecaron Apr 6, 2016

Uh oh!

robinson commented Oct 21, 2016

Uh oh!

Uh oh!

Fix for RecreateSyncProducerPoolForMetadata #34

Are you sure you want to change the base?

Fix for RecreateSyncProducerPoolForMetadata #34

Uh oh!

Conversation

maximecaron commented Mar 17, 2016

Uh oh!

mhorstma Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

maximecaron Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

mhorstma Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

maximecaron Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

robinson commented Oct 21, 2016

Uh oh!

Uh oh!