-
Notifications
You must be signed in to change notification settings - Fork 376
Fix: Anonymous Login request not cleared if app is forced close within 5 seconds on a new install #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test too
// Login operation being processed alone will surely be rejected, so we need to add an existing | ||
// external ID to this legacy operation and have it picked up by SetAlias. | ||
if (operations.isEmpty() && _identityModelStore.model.externalId != null) { | ||
loginUserOp.externalId = _identityModelStore.model.externalId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be an issue with properties being set on the wrong user?
Example:
- Login(A)
- setTag(A)
- Login(B)
- setTag(B)
Could this cause User B to get both tag A and B, when it was intended for User A to get tag A and User B to get Tag B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested this scenario and this is how the manual test is setup:
- Fresh install on main branch, right after the anonymous user is created but before the post create delay, programmatically call Login(A), setTag(A), Login(B), setTag(B) altogether and close the app.
- Switch to this branch and update the app.
- Run the app and observe in the dashboard that both users are created successfully and each with the correct tag.
After analyzing the log closely, the code in this if statement will actually never hit for the scenario. OperationRepo will process the operations in this order:
- Create subscription
- Login anonymously
- Refresh user (anonymous)
- Login A AND transfer subscription
- Set Alias for A but got 409
- Refresh user (A)
- Set Tag for A
- Login B AND transfer subscription
- Set Tag for B
Since operations like setting tags does not go with the Login call in one request and the handler for 409 conflict will take care of the IDs, this will not cause the user to be assigned with the wrong tag.
Further more, I tested out that this bug only happen when Login is interrupted with no other operations like set tag or another Login within a few seconds after app starts up.
8c75690
to
ea56a21
Compare
// Login operation being processed alone will surely be rejected, so we need to drop the request | ||
if (operations.isEmpty()) { | ||
return ExecutionResponse(ExecutionResult.FAIL_NORETRY) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If loginUserOp.externalId
is set then the operation is still valid, we probably shouldn't drop that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR description, code, and test cases so that Login with external ID is no long be dropped. Only anonymous Login that is not grouped with any other operations will be dropped.
ea56a21
to
efa596f
Compare
ddafc37
to
20951a6
Compare
val delayFor = max(delayForOnRetries, retryAfterSecondsNonNull * 1_000) | ||
val delayFor = max(delayForOnRetries, max(retryAfterSecondsNonNull * 1_000, postCreateDelay)) | ||
if (delayFor < 1) return | ||
Logging.error("Operations being delay for: $delayFor ms") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging.error("Operations being delay for: $delayFor ms")
Following up from above, especially since we also log this delay as an error, since it represents a "failure response" fundamentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log will not be related to the postCreateDelay anymore since it is moved to a separate logic.
// Handle a bad state that can happen in User Model 5.1.27 or earlier versions that old Login | ||
// request is not removed after processing if app is force-closed within the PostCreateDelay. | ||
// Anonymous Login being processed alone will surely be rejected, so we need to drop the request | ||
if (operations.isEmpty() && loginUserOp.externalId == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log an error.
The isEmpty()
is probably sufficient for this edge case. Technically this should be a check for the existence of a create/transfer subscription operation, as another user update operation, for example, is not sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to prevent dropping the login request with an externalId. However, even login with an externalId will usually require a transfer subscription so theoretically these will have the same result. Do you think we should allow login operation with an externalId that has no other operations to be processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should keep login with external ID regardless of subscriptions, since generally the SDK can recover in that state.
I meant for the check itself: operations.isEmpty() && loginUserOp.externalId == null
vs something like operations.doesNotContainSub() && loginUserOp.externalId == null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, with operations.doesNotContainSub() && loginUserOp.externalId == null
. This way the validation matches what the backend does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a change that checks if there exists any CreateSubscription or TransferSubscription operation. Will this be sufficient?
} | ||
} | ||
|
||
test("ensure anonymous login with no other operations will fail with FAIL_NORETRY to release the queue from a bad state") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explanations like "to release the queue from a bad state" should not be part of the test name, it should simply state behavior.
This tests the upgrade fix only, we should have a test that repros the original problematic behavior of not removing successful operations from the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I have fixed the explanation and created a new test case in OperationRepoTests to test this behavior.
...DK/onesignal/core/src/main/java/com/onesignal/core/internal/operations/impl/OperationRepo.kt
Show resolved
Hide resolved
|
||
/** | ||
* Wait which ever is longer, retryAfterSeconds returned by the server, | ||
* Wait which ever is longer, post create delay, retryAfterSeconds returned by the server, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PostCreateDelay should probably be a separate delay from the original intention of this method, which is to generate a delay in failure scenarios, since it is operating on the retryWaiter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the delays should be interceptable because if both postCreateDelay and retryDelay are set to 5 seconds, the total wait time should just be 5 seconds instead of 10 seconds if wait for them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't confound the waiter
with the retryWaiter
just because in this particular scenario, the result feels the same. The retryWaiter
is meant for retryable errors and can be forced when consent changes.
Both are not going to run: (1) The postCreateDelay is after a create call is successful and the response has ID translations, (2) The retries count and retryAfterSeconds that is original to this method is for failures. Either the operations are successful and contain translation IDs or it is not and is retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the waiters are different. I separated the logic in the fixup commit. Both waiters can be waiting together for different duration, but operations processing will be resumed as soon as both waiter are awake. In this case the stall time is the same as the maximum wait time among them.
2c04789
to
e5b8eb3
Compare
...DK/onesignal/core/src/main/java/com/onesignal/core/internal/operations/impl/OperationRepo.kt
Outdated
Show resolved
Hide resolved
...DK/onesignal/core/src/main/java/com/onesignal/core/internal/operations/impl/OperationRepo.kt
Outdated
Show resolved
Hide resolved
* however other parts of the system don't currently account | ||
* for this so this is not safe to do. | ||
*/ | ||
suspend fun delayForPostCreate(hasIDTranslation: Boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this method should be conditionally called, instead of letting this method check if it should delay using the passed in argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have made that change in the fixup commit.
4f1368d
to
5272900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can clean up commit history, and reword the PR title?
5272900
to
47afee3
Compare
Title is updated and commits are cleaned up. |
Description
One Line Summary
This PR fixes an issue where subscriptions do not update correctly and results in a 400 error when calling Login. It also handles this bad state when migrating from any previous version.
Details
Motivation
There is an edge case where the app successfully creates a user for the first time, but the request remains in the queue due to the
PostCreateDelay
. When this happens, all future Login calls return a 400 error, and any subscription updates fail. This PR prevents this scenario by correctly updating the operation queue and handling the bad state during migration.Scope
PostCreateDelay
. Instead, the waiting time forPostCreateDelay
is now integrated intodelayBeforeNextExecution
at the end of execution.FAIL_NORETRY
and drops the duplicated Login request.Testing
Manual testing
###Steps to reproduce
This bug can happen strictly after CreateUser is successful for the first time but the app is force-closed before the 5 seconds
PostCreateDelay
has reached. One way to reproduce this issue is:###New Install
Step 1-2 same as above
3. Re-open the app, wait for the server response and observe that Login with the external ID is successful. Confirmed that no error is logged and the fix works.
###Update From the Broken Version
Step 1-2 same as above
3. upgrade to this version without uninstalling the previous version or clearing the storage, observe that Login with the externalID is successful without error.
Affected code checklist
Checklist
Overview
Testing
Final pass
This change is