-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PulsarListener does not recover after it has startup failure #816
Comments
Hi @eljefe6a,
The 5 listeners consuming from the same topic, are you wanting each listener (method) to consume every message or have them shared across the listeners? It sounds like all methods are using the same subscription in an exclusive mode - which won't work and could explain the random behavior. Are you specifying Anytime I start mentioning subscriptions, I always do myself a favor and look at the nice diagram from the Pulsar docs that details the flexible subscription model. Can you share the 5 methods (just signatures and listener annotations)? Thanks |
They are consuming five different topics. Each one is using a different subscriber name. Here two examples:
|
I took this to mean they were on the same topic. I am glad that they are not on the same topic though as that would be a strange use of the listener :) Your listeners above look fine. I can't reproduce this locally. Can you do the following:
I wish I had a quick fix/answer for you but I can't reproduce it. Thanks |
I think I've narrowed it down. It looks like the
I've tested this a few times. When the exclusive consumer exception doesn't happen, the consumption starts and works fine. When the exception happens, the consumption doesn't start. |
I've run it several more times, and I've confirmed the issue. The The container restart process guarantees that two of the same processes are running. One will provision and start while the other waits and will then deprovision. If I manually stop the container, wait, and then start, the process won't receive an exclusive consumer error and will always receive the messages. |
Nice sleuthing @eljefe6a ! Yeh, the listeners are currently not very robust during startup in the sense that if they fail on startup they do not retry. I have notated your use case in the above feature request and we will be sure your use case is included in the solve. We now have 2 users asking for this so we will look into bumping the priority. Thanks for the detailed analysis - super helpful. |
I am going to close this now as a duplicate of #445. |
After some thought, we have re-opened this issue to focus on "Connection Retries" described in here. |
Thanks for picking this up. If you go the retry route without some kind of notify/exit on failure, that won't fully solve the problem. Is there a way for it to continue retrying in the background in a while loop so that there isn't a state where the consumer doesn't connect again? |
Yep, planning on using spring-retry but in another thread. So first attempt will be synchronous and if it fails then we will async retry every N units (configurable). |
@eljefe6a I also thought of a possible workaround for now. You could append an id to the subscription name which would in effect make the 2 processes use a different subscription. Below is example of using random number (which is not ideal). @PulsarListener(topics = "my-topic", subscriptionName = "my-sub-#{T(java.lang.Math).random()}",
|
The workaround I'm doing now is operational. I'm waiting for the other container to stop before starting the new one. |
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Previously, when a listener container failed to start, it would only log the exception. This commit introduces `StartupFailurePolicy` that allows listener containers to CONTINUE, STOP, RETRY when an error is encountered on startup. See spring-projects#445 See spring-projects#816
Hi @eljefe6a , the ability to configure startup failure policy has just been merged. The follow up is to add Spring Boot config props to tune the policy. By default, it is set to fail fast. When set to RETRY the default is 3 times w/ a fixed 10sec delay between attempts. Likely in your case you will want to specify a custom retry template bean to account for the time it takes your pods to come online etc.. We will be following up w/ some docs on how to configure this w/ and w/o Spring Boot. |
Awesome. Thanks for making this change so quickly. For posterity and those who encounter this issue later, does this restart issue affect consumers that have successfully been created on startup but hit a connectivity issue later in the process' lifecycle? |
You are more than welcome.
It does not handle post-startup failures. Once started, the listener thread sits in a loop and attempts a That would be another resiliency feature targeted post-startup. |
Closing with the docs commit 8ed6689 |
I'm hitting an issue where the consumption randomly stops working. I haven't seen any pattern to it. I'm using spring-pulsar 1.1.0.
I've spent the past 4 hours trying to figure out why data wasn't being saved to the database. It turned out that the spring-pulsar consumers weren't receiving data. I added more logging to the
@PulsarListener
annotated method to log out the object after the first line of code. It still didn't log out. I added more logging to the@PulsarListener
annotated method, and it started working. I restarted the container (with no code changes), and it stopped logging. I restarted again with no code changes, and it started logging again.The methods are consuming an Avro topic. The data is being produced, and I verified with the
pulsar-admin
that new messages are arriving. There are five different methods in that class that are@PulsarListener
listener annotated. With this bug, some methods will receive messages and others won't. No exceptions are being logged, except some initial exclusive consumer exceptions while the other container is being terminated that stop being logged after startup. The methods are using bothtopicPattern
andtopics
. The process is running in a K8s container.Is there any known limitation or bug that I'm hitting? It makes no sense that the methods stop receiving data with no code changes.
The text was updated successfully, but these errors were encountered: