Skip to content
This repository was archived by the owner on Jul 15, 2023. It is now read-only.

Fix for RecreateSyncProducerPoolForMetadata #34

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Fix for RecreateSyncProducerPoolForMetadata #34

wants to merge 5 commits into from

Conversation

maximecaron
Copy link

RecreateSyncProducerPoolForMetadata was never called in RefreshMetadata() if RefreshMetadataInternal throw exceptions.
Thus the consumer get stuck with an invalid list of broker unless KafkaSimpleManager is recreated completely.

If ProducerPoolForMetadata get stale or invalid call to RefreshMetadataInternal will throw Kafka.Client.Exceptions.UnableToConnectToHostException.
Calling RecreateSyncProducerPoolForMetadata will query zookeeper and recreate the pool that is passed to BrokerPartitionInfo by RefreshMetadataInternal allowing for next retry to succeed.

at Kafka.Client.KafkaConnection.Connect()
at Kafka.Client.KafkaConnection.Send(Byte[] data)
at Kafka.Client.KafkaConnection.Handle[T](Byte[] data, IResponseParser`1 parser, Boolean shouldParse)
at Kafka.Client.KafkaConnection.Send(TopicMetadataRequest request)
at Kafka.Client.Producers.Partitioning.BrokerPartitionInfo.UpdateInfo(Int16 versionId, Int32 correlationId, String clientId, String topic)
at Kafka.Client.Helper.KafkaSimpleManager`2.RefreshMetadataInternal(Int16 versionId, String clientId, Int32 correlationId, String topic, Dictionary`2 tempTopicMetadatas, Dictionary`2 tempTopicMetadatasLastUpdateTime, Dictionary`2 partitionLeaders)
at Kafka.Client.Helper.KafkaSimpleManager`2.RefreshMetadata(Int16 versionId, String clientId, Int32 correlationId, String topic, Boolean force)
{
Logger.WarnFormat("Got null for metadata of topic {0}, will RecreateSyncProducerPoolForMetadata and retry . ", topic);
Logger.WarnFormat("Got exception while refreshing metadata of topic {0}, will RecreateSyncProducerPoolForMetadata and retry . {1} ",topic,
ExceptionUtil.GetExceptionDetailInfo(ex));
RecreateSyncProducerPoolForMetadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern is that RecreateSyncProducerPoolForMetadata() can throw. I hate to see new exceptions thrown from inside a catch block. Could this be restructured to move this call be into its own try/catch or just extracted from the current try/catch altogether and let it throw if its going to? If it does, it probably indicates a more severe error where a retry may not help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think RecreateSyncProducerPoolForMetadata() should be extracted/moved outside the catch block as it should not need to be called each time RefreshMetadata() is called but only if RefreshMetadata() fail.

If what you mean is something such as setting a bool flag inside the catch block to indicate a failure and then call RecreateSyncProducerPoolForMetadata() outside of the catch block if the flag was set, this could be done and would have the same intended result. but I am not sure I understand how this will make it better.

We could also add an extra try/catch around RecreateSyncProducerPoolForMetadata(); to honor the retryCount if you think this is better, but the retry loop was not really needed to start with since the caller of RefreshMetadata() could simply retry calling RefreshMetadata() if an exception is thrown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues here, really. One is that the stack trace for any exceptions thrown from within the catch block will be convoluted (not the end of the world), and two is that this code is using the exception for flow control. IF the code in the try block throws an exception, THEN execute RecreateSyncProducerPoolForMetadata(). That part makes me most uncomfortable. Could you tighten up the try block to only wrap RefreshMetadataInternal, the problem method, then set a bool in the catch block and if the bool is true, call Recreate...() lower down? That way any exceptions are handled and execution moves on to an explicit flow control block.

Sorry to push on this. This is a needed fix, but this lib is already pretty liberal with exceptions and being more disciplined about how they are used in new code seems like a good thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure wrapping only RefreshMetadataInternal and setting bool sounds good to me.

Each retry is doing a network call. The caller of RefreshMetadata should take care of retry using Exponential Backoff and Jitter. 
Blindly retrying in a loop is not likely to work and will often make things worst because of thundering herd effect on Zookeeper and Kafka severs.
@robinson
Copy link

Cool, from when Microsoft use the Pull Request Bot? In the near future, the bot probably fix the bug and full request for us! 👍

@microsoft microsoft deleted a comment from msftclas Sep 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants