Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Subscription Timeout retries was broken when fixing the Readhandler creation issue #31635

Closed
mkardous-silabs opened this issue Jan 23, 2024 · 7 comments · Fixed by #31755
Assignees
Labels
bug Something isn't working needs triage

Comments

@mkardous-silabs
Copy link
Contributor

Reproduction steps

PR #30491 Modified the Subscription Timeout logic in a such a way where the server will only retry once before deleting the persisted subscription entry.

SubscriptionResumptionSessionEstablisher::HandleDeviceConnectionFailure deletes the subscription entry if the server fails to establish a case session. This modifies the previously released feature without the PR mentionning this change.

To reproduce :

  • Build a sample app with chip_persist_subscriptions=true chip_subscription_timeout_resumption=true
  • Commission the device
  • Establish a subscription with chip-tool without being in interactive mode
  • Let the subscription fail since the chip-tool will not send a status response
  • After 5 minutes, server will try to resub and fail
  • Device will never retry to resub

Bug prevalence

Everytime

GitHub hash of the SDK that was being used

166c4b5

Platform

core

Platform Version(s)

No response

Anything else?

This is a regression of a feature released in v1.2.

@mkardous-silabs mkardous-silabs added bug Something isn't working needs triage labels Jan 23, 2024
@mkardous-silabs
Copy link
Contributor Author

Can't assign the issue to @wqx6 for some reason who had done #30491.

@wqx6
Copy link
Contributor

wqx6 commented Jan 24, 2024

When the server fails to establish the session, the client might be shutdown. Since the client will never store the subscription information, the client will forget the ReadClient which holds the resumed subscription after it reboots. Then the subscription will be meaningless since the report will never be handled. That's the reason why I delete the the subscription entry.

@mkardous-silabs
Copy link
Contributor Author

I'm not sure the explanation justifies unilaterally changing the entires features behavior...
I don't think we can assume clients don't have a means to store subscriptions. Even if they do not, they will most likely create a new one if they receive a subscription resumption from a server if they support the feature.

At the end of the day, i don't think it this is an ok change. In my opignion, a better approach would have been to add a number of retries before deleting the entry that could have been set to one to allow the behavior you want without changing how the feature behaves.

@jtung-apple
Copy link
Contributor

I agree with Mathieu. Subscription resumption has to retry multiple times, in case of temporary network outage, for example. I failed to notice the Delete change in the previous PR and I feel that part has to be reverted to keep the original logic.

@wqx6
Copy link
Contributor

wqx6 commented Jan 31, 2024

Also I found another cause why the sever app does not retry on resuming subscriptions.

The previous work flow is:
server app tries to resume subscription and create a read handler for it-> server app fails to establish the session-> the read handler is closed and InteractionModelEngine::OnDone() is called->sever app schedules another subscription resumption

But the current work flow is:
sever app tries to resume subscription->server app fails to establish the session

The read handler will not be created and InterationModelEngine::OnDone() will not be called. So server app will never schedule another subscription resumption.

Should we schedule subscription resumption when failing to establish the session? @mkardous-silabs @jtung-apple

@jtung-apple
Copy link
Contributor

Also I found another cause why the sever app does not retry on resuming subscriptions.

The previous work flow is: server app tries to resume subscription and create a read handler for it-> server app fails to establish the session-> the read handler is closed and InteractionModelEngine::OnDone() is called->sever app schedules another subscription resumption

But the current work flow is: sever app tries to resume subscription->server app fails to establish the session

The read handler will not be created and InterationModelEngine::OnDone() will not be called. So server app will never schedule another subscription resumption.

Should we schedule subscription resumption when failing to establish the session? @mkardous-silabs @jtung-apple

Yes we should schedule subscription resumption in this case. Please restore the intended logic. Thanks!

@jtung-apple
Copy link
Contributor

@wqx6 Will you be making a fix soon for this?

@mergify mergify bot closed this as completed in #31755 Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants