Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStore sync processor rety error - about every 24 hours #12954

Open
3 tasks done
alex-breen opened this issue Feb 3, 2024 · 14 comments
Open
3 tasks done

DataStore sync processor rety error - about every 24 hours #12954

alex-breen opened this issue Feb 3, 2024 · 14 comments
Labels
bug Something isn't working DataStore Related to DataStore category

Comments

@alex-breen
Copy link

alex-breen commented Feb 3, 2024

Before opening, please confirm:

JavaScript Framework

React

Amplify APIs

Authentication, DataStore

Amplify Version

v6

Amplify Categories

auth, api

Backend

None

Environment information

# Put output below this line
  System:
    OS: Windows 11 10.0.22621
    CPU: (6) x64 Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
    Memory: 2.05 GB / 15.94 GB
  Binaries:
    Node: 18.18.0 - C:\Program Files\nodejs\node.EXE
    npm: 10.1.0 - C:\Program Files\nodejs\npm.CMD
  Browsers:
    Chrome: 121.0.6167.140
    Edge: Chromium (121.0.2277.98)
    Internet Explorer: 11.0.22621.1
  npmPackages:
    @aws-amplify/ui-react: ^6.1.2 => 6.1.2 
    @aws-amplify/ui-react-internal:  undefined ()
    @aws-amplify/ui-react-storage: ^3.0.11 => 3.0.11 
    @babel/cli: ^7.21.0 => 7.23.4 
    @babel/core: ^7.21.4 => 7.23.7 
    @babel/plugin-proposal-private-property-in-object: ^7.21.11 => 7.21.11 (7.21.0-placeholder-for-preset-env.2)
    @babel/preset-env: ^7.21.4 => 7.23.8 
    @emotion/react: ^11.10.6 => 11.11.3 
    @emotion/styled: ^11.10.6 => 11.11.0 
    @hello-pangea/dnd: ^16.5.0 => 16.5.0 
    @mui/base: ^5.0.0-alpha.124 => 5.0.0-beta.31 
    @mui/icons-material: ^5.11.16 => 5.15.4 
    @mui/material: ^5.11.16 => 5.15.4 
    @prezly/slate-lists: ^0.101.0 => 0.101.0 
    @testing-library/jest-dom: ^5.16.5 => 5.17.0 
    @testing-library/react: ^13.4.0 => 13.4.0 
    @testing-library/user-event: ^13.5.0 => 13.5.0 
    @types/react: ^18.2.47 => 18.2.47 
    @types/react-beautiful-dnd: ^13.1.8 => 13.1.8 
    @types/react-dom: ^18.2.17 => 18.2.18 
    @xenova/transformers: ./private_modules/@xenova/transformers => 2.6.0 
    aws-amplify: ^6.0.13 => 6.0.13 
    aws-amplify/adapter-core:  undefined ()
    aws-amplify/analytics:  undefined ()
    aws-amplify/analytics/kinesis:  undefined ()
    aws-amplify/analytics/kinesis-firehose:  undefined ()
    aws-amplify/analytics/personalize:  undefined ()
    aws-amplify/analytics/pinpoint:  undefined ()
    aws-amplify/api:  undefined ()
    aws-amplify/api/server:  undefined ()
    aws-amplify/auth:  undefined ()
    aws-amplify/auth/cognito:  undefined ()
    aws-amplify/auth/cognito/server:  undefined ()
    aws-amplify/auth/enable-oauth-listener:  undefined ()
    aws-amplify/auth/server:  undefined ()
    aws-amplify/datastore:  undefined ()
    aws-amplify/in-app-messaging:  undefined ()
    aws-amplify/in-app-messaging/pinpoint:  undefined ()
    aws-amplify/push-notifications:  undefined ()
    aws-amplify/push-notifications/pinpoint:  undefined ()
    aws-amplify/storage:  undefined ()
    aws-amplify/storage/s3:  undefined ()
    aws-amplify/storage/s3/server:  undefined ()
    aws-amplify/storage/server:  undefined ()
    aws-amplify/utils:  undefined ()
    compromise: ^14.10.0 => 14.11.0
    eslint: ^8.55.0 => 8.56.0
    eslint-plugin-react: ^7.33.2 => 7.33.2
    eslint-plugin-react-hooks: ^4.6.0 => 4.6.0
    eslint-plugin-react-refresh: ^0.4.5 => 0.4.5
    is-hotkey: ^0.2.0 => 0.2.0
    lodash: ^4.17.21 => 4.17.21
    nlcst-to-string: ^3.1.1 => 3.1.1
    node-window-polyfill: ^1.0.2 => 1.0.2
    react: ^18.2.0 => 18.2.0
    react-avatar-editor: ^13.0.0 => 13.0.2
    react-color: ^2.19.3 => 2.19.3
    react-dom: ^18.2.0 => 18.2.0
    react-dropzone: ^14.2.3 => 14.2.3
    react-scripts: 5.0.1 => 5.0.1
    retext: ^8.1.0 => 8.1.0
    retext-keywords: ^7.2.0 => 7.2.0
    retext-pos: ^4.1.0 => 4.1.0
    slate-history: ^0.100.0 => 0.100.0
    slate-hyperscript: ^0.100.0 => 0.100.0
    slate-react: ^0.101.5 => 0.101.5
    svg-gauge: ^1.0.7 => 1.0.7
    uuid: ^9.0.0 => 9.0.1 (8.3.2)
    web-vitals: ^2.1.4 => 2.1.4
  npmGlobalPackages:
    @forge/cli: 4.1.1
    atlas-connect: 0.8.1
    firebase-tools: 10.7.0
    mongodb-realm-cli: 2.4.2
    npm-check-updates: 12.5.9
    npm: 10.1.0
    serve: 14.2.1

Describe the bug

When waking up computer from sleep state, website throws a series of errors (uncaught, over website screen).

subscription.ts:84 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'cognito:username')
    at subscription.ts:84:1
console.js:213 [ERROR] 33:31.645 DataStore - Sync processor retry error: Error: No current user
    at InternalGraphQLAPIClass._headerBasedAuth (InternalGraphQLAPI.ts:68:1)
    at async InternalGraphQLAPIClass._graphql (InternalGraphQLAPI.ts:168:1)
    at async sync.ts:119:1
    at async retry.ts:33:1

Everything resolves by itself but the user sees the browser error modal and has to close it.
I don't have a clear way to reproduce this, but my hunch is that it occurs after the 24 hour "base query" sync from Datastore. I could be seeing a pattern that is not real, but it does seem to occur about once a day, after there has been some idle time (like computer at sleep). I pretty sure this started appearing after I upgraded to v6.

Here is the screenshot of from the browser's console logs.
image

Here is the screen shot from the browser's network logs:
image

Browser window screenshot with errors:
image

Expected behavior

no errors

Reproduction steps

Wake up computer from sleep.

Code Snippet

// Put your code below this line.

Log output

// Put your logs below this line


aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

No response

Mobile Operating System

No response

Mobile Browser

No response

Mobile Browser Version

No response

Additional information and screenshots

No response

@alex-breen alex-breen added the pending-triage Issue is pending triage label Feb 3, 2024
@cwomack cwomack added the DataStore Related to DataStore category label Feb 4, 2024
@cwomack cwomack self-assigned this Feb 4, 2024
@cwomack
Copy link
Member

cwomack commented Feb 5, 2024

Hello, @alex-breen 👋 and sorry to hear you're running into this. It sounds like this could possibly be related to when DataStore is trying to re-establish the connection, but the tokens for a given user are potentially expired. Can you look into what you've configured for the expiration timeframe for your refresh and access tokens to? Does this happen in a production version of the app as well or just locally when in dev mode?

@cwomack cwomack added pending-response question General question labels Feb 5, 2024
@alex-breen
Copy link
Author

alex-breen commented Feb 5, 2024

Hi @cwomack -

My cognito settings are:

Refresh token expiration: 30 day(s)
Access token expiration: 1 hour(s)
ID token expiration: 1 hour(s)

Does this happen in prod? Yes, I'm pretty sure at least. I'm sure I've seen the browser log errors, and pretty sure I've seen the red error overlay over the screen, but I'll watch for that to be sure.

<<Update: Happens in prod, but overlay with red messages doesn't appear on browser in prod, only console errors. In dev, overlay with red errors appears.>>

@alex-breen
Copy link
Author

Update: On production, I've encountered the issue twice (about every 24 hours). The console logs show the errors:

Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'cognito:username')
ConsoleLogger.ts:85 [ERROR] 00:45.178 DataStore - Sync processor retry error: Error: No current user

But the browser screen doesn't get the overlay with red error messages. On the dev build, it does.

@alex-breen
Copy link
Author

Another update (reproduction is slow because it seems to occur after several hours of sleep or inactivity):

  • Cognito is able to re-establish a connection (token refresh?) on its own
  • Datastore updates work (I successful mutations in network logs)
  • But Datastore sync does not re-establish, so updates are missed. The user sees stale data out of sync from the backend.

@cwomack is there something I can do to catch this when it happens and restart Datastore?

@alex-breen
Copy link
Author

I have Hub logs with today's failed sync. I hope this is helpful.
.Steps to repro: wake up from sleep.
What I see is that the Hub says the model is sync'd. However, it is not processing inbound updates from the backend. However, it is able to process outbound mutations.

image

@alex-breen
Copy link
Author

Do you recommend I call Datastore.start() after the Hub event "ready"? My aim is to reset the sync process for cases in which it thinks it is syncing, but it is not. Will this kickstart it up again?

Note, I can't call Datastore.stop() and then Datastore.start() as that often results in an uncaught error because there are other observe queries in flight.

@cwomack cwomack added investigating This issue is being investigated and removed pending-triage Issue is pending triage labels Mar 1, 2024
@svidgen
Copy link
Member

svidgen commented Mar 7, 2024

@alex-breen As a potential path to a workaround while the repro of this is pending, do any of the auth events indicate the need for a Datastore restart?

https://docs.amplify.aws/react/build-a-backend/auth/auth-events/#listen-to-and-log-auth-events

@alex-breen
Copy link
Author

@svidgen - I don't think so, as no error events are passed. Datastore events indicate everything is working (syncQueriesReady) and Auth is fine.

@alex-breen
Copy link
Author

@cwomack - I've tried cycling a stop and start of Datastore every time there is a Hub networkStatus event. But the sync doesn't get restablished. The websocket connection (eg: wss://abc.appsync-realtime-api.us-east-1.amazonaws.com/graphql?header...) stays unresponsive. The Hub events imply success (proceeds to ready) and outbound mutations work, but inbound sync updates are not received.

@alex-breen
Copy link
Author

To reproduce (artificially):

  • On Chrome devtools, change the accessToken (remove one letter)
  • On Chrome devtools, set Network to Offline
  • Wait a few seconds. Change the accessToken back to its original value.
  • On Chrome devtools, set Network to No Throttling
  • Observe: Sync processor retry error in Chrome devtools console.
  • Wait until Hub event for ready appears (if you are logging `Hub events). Otherwise just wait a few seconds.
  • Open a second browser window and load the same web app. Update a value with Datastore.
  • Observe: On the first browser window, data change is not updated.

@cwomack cwomack added pending-triage Issue is pending triage and removed investigating This issue is being investigated question General question labels Apr 17, 2024
@chrisbonifacio chrisbonifacio added bug Something isn't working and removed pending-triage Issue is pending triage labels Apr 17, 2024
@chrisbonifacio
Copy link
Member

Hi @alex-breen, thank you for the repro steps. Unfortunately, I have not been able to reproduce this issue with the provided steps.

These were the events logged from the Hub during my reproduction. In my case I might've run into a different issue - I never got another ready event after the first one.

Screenshot 2024-04-19 at 10 43 43 AM

I also observed that updating a record from another browser instance was reflected.

The highlighted record was updated from a second, incognito window.
image

I am only able to get the Sync process retry error if I force it by attempting to start the sync process with an invalid access token (calling DataStore.start())

image

Are you sure that those steps are in the correct order?

@alex-breen
Copy link
Author

Hi @chrisbonifacio - I've jotted down a few things that might help shed some light:

  • I do see that often a ready event does not follow syncQueriesReady after a networkStatus event. However, I haven't seen any issues with syncing in that case.
  • For me, sync "breaks" only after the errors for "sync processer retry error". Getting it into that state happens typically after I wake my computer up from sleep and the app starts reconnecting, or on mobile the tab has been idle for a whioe. I've assumed it is a token issue. I can get it into the same state by changing the accessToken value in localstorage. But I'm only guessing that is addressing the same root cause.
  • The main thing I see is that after the "sync processor retry error" almost everything does reconnect. For example, outbound mutations work. But the websocket connection that receives data updates never reconnects. The last message is stop.
  • Finally (and file this under "King Hackus Maximus"), I have implemented some steps that (so far) seem to workaround the issue under real conditions (ie, waking up from sleep). Hopefully it helps diagnose the root cause. I do the following - every time there is a networkStatus event, I 1) call DataStore.stop() 2) Wait 10 seconds 3) call fetchAuthSession({ forceRefresh: true }) 4) call DataStore.start(). Sometimes the first fetchAuthSession fails, so I have one retry after 10 seconds.

@chrisbonifacio chrisbonifacio removed their assignment May 20, 2024
@alex-breen
Copy link
Author

@chrisbonifacio - by any chance is there an update on this?

Even with my hacky workaround, I've seen Safari fail (silently), leading to data loss on an add request (amplify js doesn't throw an error, but sync fails to write, but app doesn't know this, so data is lost). At the very least, just a way to listen for when sync is broken so that I can force a reload would help.

@alex-breen
Copy link
Author

alex-breen commented Jul 6, 2024

@chrisbonifacio @cwomack - I've reproduced this with a very simple app that mostly uses the example code in Amplify documentation.

Behavior: Wake up computer after sleep. The last refresh of the webpage should be over 12 hours ago. Update data from a separate computer (or incognito browser). Result - The update does not appear on the primary browser. Data is now out of sync. Hub events report "syncQueriesReady" but websocket messages are not received.

I've attached screenshots of the console and network errors that appear right after wake up. I've also attached the relevant code. It's important that authentication is enabled and model schema has @auth(rules: [{allow: owner}]). I tested this running in build/vite-preview mode, locally (but I've reproduced it many, many times in production and in dev mode.) I don't have a direct way to trigger the error, other than load the page in the morning, put your computer to sleep at night, and wake up your computer the next morning. Also, if you refresh the page, the updated data syncs as the websocket is restored. What makes this issue particularly difficult is the Hub events indicate everything is working and sync'd, when it is not..

image
image
package.json
schema.txt
appDS.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DataStore Related to DataStore category
Projects
None yet
Development

No branches or pull requests

4 participants