-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix]Handle incorrect 404 responses; add a delay after creates and retries on 404 of new ids #2059
Conversation
f27f636
to
923c6e4
Compare
This is to account for the case where we create a User or Subscription and attempt to immediately access it (via GET or PATCH). A delay is needed as the backend may incorrectly 404 otherwise, due to a small delay in it's server replication. A cold down period like this also helps improve batching as well.
923c6e4
to
83fa82b
Compare
This is a new behavior which wakes a specific amount of time after something is created (such as a User or Subscription) before make a network call to get or update it. The main motivation is that the OneSignal's server may return a 404 if you attempt to GET or PATCH on something that was just created. This is due fact that OneSignal's backend server replication sometimes has a delay under load. This may be considered a bug in the backend, but on the flip side the SDK is being inefficient if a follow up network request is made any way, which was the 2nd motivation for this change.
0ba8c2c
to
bc9f938
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have played with this version and added some questions or potential problems in the review below. Generally a very good time tracking approach to determine whether a MISSING response should be retried. The unit tests are complete as well. Great work!
I have tested the general functionalities as well. No noticeable service lag or unexpected behavior. Now just waiting for the response of these discussions.
...nal/core/src/main/java/com/onesignal/user/internal/operations/impl/states/NewRecordsState.kt
Show resolved
Hide resolved
...main/java/com/onesignal/user/internal/operations/impl/executors/IdentityOperationExecutor.kt
Show resolved
Hide resolved
...DK/onesignal/core/src/test/java/com/onesignal/core/internal/operations/OperationRepoTests.kt
Show resolved
Hide resolved
In summary, this is fallback logic to opRepoPostCreateDelay. The opRepoPostCreateDelay is a balance between not waiting too long where the SDK isn't being response, but long enough where the likelihood of running into the backend replica delay issue is low. Given the above, we can further lower the chance of this issue by retrying requests that fail with 404 or 410 for Subscriptions. We don't want to do this with all 404/410 requests so we utilize the same NewRecordsState list, but added a new isInMissingRetryWindow method and opRepoPostCreateRetryUpTo value to drive the window value. Added tests to executors to ensure the new 404 logic is working. Some executors were missing 404 handling altogether so we filled in those gaps in this commit as well.
We need to check the existingOnesignalId first as most of the time the login executor tries to do an operation to the User, add Alias, first instead of creating a new User. This makes sure our opRepoPostCreateDelay applies.
After we wait for the opRepoPostCreateDelay time call to waiter.wake() would cause waitForNewOperationAndExecutionInterval() to wake up as we expect, however it was then waiting for the full time on opRepoExecutionInterval as well. To solve this we simply subtract the time we already waited from opRepoExecutionInterval. With the current values this means we fully skip opRepoExecutionInterval so we only wait opRepoPostCreateDelay.
a8adb55
to
1c2bf82
Compare
If the backend returns 410 then it indicates it knows it existed at one point and it is deleted now. Where a 404 is more abstract.
Description
One Line Summary
Handle incorrect 404 responses from the OneSignal's backend by; 1. Add a delay after creates; 2. Retry on 404 for a short window after create.
Details
Backend Issue
OneSignal sometimes returns a 404 on GET an PATCH requests if you are accessing something immediately after it was created. Normally the OneSignal backend can accept fetch/updates right way, but it all depends on it's server load. There isn't a guaranteed amount of time a client can wait, so SDKs has to work around this problem.
SDK's work around strategy
This PR introduces two ways to work around this 404 problem:
Motivation
404/410 responses are being handled by recreating the missing record (User or Subscription), however this isn't always the right way to handle these, give the backend caveat noted above. Recreating these records can create side-effects and extra load we want to avoid.
Scope
Add a delay processing some operations in the
OperationRepo
, and affects 404/410 handling in executors.Future Considerations
Testing
Unit testing
Manual testing
Tested on an Android 14 emulator.
Tested with the following code:
Which produced the following log:
Affected code checklist
Checklist
Overview
Testing
Final pass
This change is