-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscriptions seem to fail until after a service restart #2269
Comments
Do you have more logs of the restate-server? You can get more logs of the kafka component using the following env variable: |
Perhaps could the problem be that you're trying to create a subscription before the service is actually registered? In that case, the subscription creating will return back an error. |
Even with that there really aren't a lot of relevant logs. The following is out of a test run. There's a bunch of info level registration logs above.
If I create a listener to that Kafka topic I would see a message on the above topic shortly after that subscription is created, but there's nothing in the logs or execution of the service that would indicate it was received by the Restate consumer.
Not unless the deployment process can take longer than 2 seconds after the HTTP call replies. The very first thing the test code does is start and register the service. I've added pauses in in case there is some delay after registration with no change. It's probably worth noting that I call other parts of the service to load data in before creating the subscription and that succeeds. So Restate absolutely knows it's there and how to call it. |
I've added in a bunch of my own logging to try to work out where the issue is. It would seem the first time no message comes from the rdkafka::consumer::StreamConsumer. Failing logs:
Exactly the same test run a second time results in:
My reading of this is that the subscription is being created the first time but nothing is being sent on it. The second time a subscription is created everything starts flowing correctly. Both runs create new subscriptions, which can be seen in the admin API: {
"subscriptions": [
{
"id": "sub_12x4St13qmf4QBVUXCEcdjj",
"source": "kafka://redpanda/message-start",
"sink": "service://message_start_service/Handle",
"options": {
"client.id": "restate",
"group.id": "sub_12x4St13qmf4QBVUXCEcdjj"
}
},
{
"id": "sub_14NOwbMfb9oSsTPpLxEYhJn",
"source": "kafka://redpanda/message-start",
"sink": "service://message_start_service/Handle",
"options": {
"client.id": "restate",
"group.id": "sub_14NOwbMfb9oSsTPpLxEYhJn"
}
}
]
} |
I have another hypothesis, that actually comes from our own Kafka tests: https://github.com/restatedev/sdk-test-suite/blob/main/src/main/kotlin/dev/restate/sdktesting/tests/KafkaIngress.kt#L99 The subscription process is completely asynchronous, after restate accepts the subscription it replies with 200 and then it takes some time before the consumer group is created in Kafka. Due to that, if you send a kafka record before the consumer group starts (essentially you do step 3 of your test before step 2 creates the consumer group). Now consumer groups default configuration is just to start from latest record, rather than from first record, thus you won't see records that were published before the consumer group actually started. Most likely in the second test attempt, you get now the request but from the first subscription, and not the second one you created in another attempt. Unfortunately the log lines you posted miss some details to verify this hypothesis (I need to fix that on our side), but one thing you can do (and perhaps it's a good practice to do in the tests anyway) is to set |
I've just tried a few things to test that hypothesis. Some good news: adding the
To test this theory I added a 15 second pause between requesting the subscription and sending the message. That did seem to work correctly.
I don't believe this is what is happening. If it were I'd expect that deleting the first subscription then running the test would result in a test failure (no messages received). However if I start everything, run the test twice so it passes, then delete all subscriptions and re-run the test it still passes. It seems to me more like there is a slower startup the first time a new subscription is created and I was hitting that on the cold startup attempt.
With your additional logs added:
Thanks for your help on this by the way. |
Agreed, for production this probably doesn't make sense, but in tests where you have two processes involved the consumer booting is generally always asynchronous, that is true in the "usual" kafka APIs too, so in that case Not sure if there's anything else actionable in this task now :) |
Description
When using Restate subscriptions with the Restate Go SDK messages don't seem to trigger
execution until a restart of the deployed application.
Reproduction
message_start_service/Handle
function is never triggered.Attempted fixes
In my case I have a test that does all of these steps automatically. I've tried adding pauses
into the system to ensure messages aren't being sent too early. I've also tried sending multiple
messages in case the first one always gets skipped. Neither of these have resolved the issue.
If I run the test twice without rebuilding the restate/Kafka deployment (effectively step 2 and down)
everything works perfectly.
The text was updated successfully, but these errors were encountered: