- 
                Notifications
    You must be signed in to change notification settings 
- Fork 851
Add context on when to sync Clerk data with webhooks #2368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for Git ↗︎ 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start but I think we'd need a bit more hand-holding here since this is a fairly complex topic, but that many Clerk users will quickly run into.
        
          
                docs/_partials/metadata-callout.mdx
              
                Outdated
          
        
      | @@ -0,0 +1,4 @@ | |||
| > [!WARNING] | |||
| > Metadata is limited to **8KB** maximum. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't accurate for a couple reasons:
- This is a limitation of metadata when it's included as a property of the session token only - not of metadata in general
- This may have changed recently, and Jacob Foshee would be the one to ask, but the handshake payload overhead reduces this limit by ~50%, so even a ~4kb metadata payload can error out 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfoshee hi friend, could you give us some clarity on this so we can update the docs accordingly 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When just talking about a metadata field (any of unstable, public or private) on any of the various objects (user, organization membership, organization, organization invitation, app invitation and so on) that field has an 8kb limit. This is not related to the session claims or handshake.
eg, if I wanted to store data in the user object's publicMetadata and I was not adding the entire publicMetadata or parts of it to the session object, I could store 8kb of data.
The callout is correct, but not for the context of discussing metadata added to session objects and that's where it can get confusing as the following examples could all apply.
Example 1: Adding all of publicMetadata to the session w/ no custom claims
This is more or less what most of the discussion here is about and is the easiest math wise. In this case the total size of the publicMetadata plus other session claims can't exceed 4kb (cookie size) or about 2kb (handshake, at least until the work is done to shard that)
Example 2: Adding all of publicMetadata to the session w/ custom claims
This is the same as above, but with the added complexity of the other custom session claim(s). How large can each of those be for the application? Those all need to be factored into the math: session claims + publicMetadata + other session claim max sizes
Example 3: Adding specific fields from publicMetadata w/ or without custom sessions claims
Imagine that the application is storing several fields in publicMetada on the user object like so:
{
  onboardingComplete: true,
  birthday: "2000-01-01",
  country: "Canada",
  bio: "This is a bio -- imagine it is 6kb of written info"
}
The application can add one or two fields to the session claim, and not include large chunks of data:
{
	"onboardingComplete": "{{user.public_metadata.onboardingComplete}}",
}
With this the math is: session claims + onboardingComplete + any other custom session claims. This example demonstrates how an application can have data that is close to the 8kb max for publicMetadata but still use a select portion of this for the session claim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@royanger is broadly correct. When it comes to the traditional handshake though, it's actually a little worse (explanation below)
I would recommend something like:
User Session Metadata is practically limited to roughly 1.2KB while total user Metadata is limited to 8KB
Or however you can phrase that more succinctly/clearly... It is difficult. We might need to link/expand to more explanation.
In a typical Handshake the contents of the session will be twice wrapped in a JWT with its base64 encoding... Along with all the other overhead of the JWT. And put in a cookie. That cookie has a limit of 4KB. And in a quick test an "empty" handshake cookie is ~1.8KB without any custom claims.
Gemini helped me come up with the 1.2KB number. Here is a gdoc with some explanation. Basically the encoding inflates the size by 1.8x.
All of that said, we have rolled out the new handshake flow to select customers. I've not heard of any issues thus far. The new flow should effectively eliminate cookie size problems. So a customer that encounters this limit can reach out and have the new flow enabled.
We would still recommend keeping it below 1.2KB though! Because that will perform better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the specifics around the handshake, @jfoshee!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've made some significant changes regarding this info: 85d7648
        
          
                docs/_partials/metadata-callout.mdx
              
                Outdated
          
        
      | > [!WARNING] | ||
| > Metadata is limited to **8KB** maximum. | ||
| > | ||
| > If you use Clerk metadata and modify it server-side, the changes won't appear in the session token until the next refresh. To avoid race conditions, either force a JWT refresh after metadata changes or handle the delay in your application logic. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true but requires a good chunk of expansion on the specifics. I think many Clerk users don't have a nuanced understanding of how Clerk mints and refreshes tokens, and this would be confusing. I do think a specific guide on this topic would be useful, then could perhaps be referenced here to clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, we can expand on it! I can try to take a crack at it and have you review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep a section like this and link to a longer guide about sessions, it still might be worth link to https://clerk.com/docs/hooks/use-user#reload-user-data in this callout for an example for those customers who just want to know how to force a JWT refresh. This would also link to getToken({ skipCache: true }) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by, it would "also link to getToken({ skipCache: true }) instead."
instead of what? and where does this come into play when forcing a JWT refresh? I thought you only had to do a user.reload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do getToken({ skipCache: true }) or user.reload(). The former skips the cookies and get and newly minted session JWT from FAPI.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| ## When to sync Clerk data | ||
|  | ||
| You should only sync Clerk data to your database when absolutely necessary. The most notable example is if your app has social features where users can see content posted by other users. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You should only sync Clerk data to your database when absolutely necessary" is a really strong statement that a lot of people will disagree with. I think we'd want to qualify such a statement if we're going to make it. Generally, the reason we recommend this is that:
- Syncing data with webhooks is eventually consistent, which is often a challenging construct to work with and can cause a lot of bugs and race conditions if not handled very carefully. Honestly even saying it's eventually consistent is a bold claim, as webhooks can fail, and that case also needs to be handled.
- If you can access the same data out of the session token, you can force strong consistency, and also save all of the resources required to store the data in your own database, and all the latency it takes to access that data on every request, since the session token payload is provided for only the cost of a signature verification (~1ms, no i/o) on every request from frontend to backend in your app. This makes not syncing data much more efficient in cases where you can get away with it, which, in our experience, is very often the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've made some significant changes regarding this info: 85d7648
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| You should only sync Clerk data to your database when absolutely necessary. The most notable example is if your app has social features where users can see content posted by other users. | ||
|  | ||
| With Clerk, you can only access the currently signed in user's data from Clerk's frontend API. If you need to display a another user's name, avatar, etc., you can't access that data from Clerk's frontend API. You could use Clerk's backend API to fetch user data for each request, but it's slow and may result in rate limiting. So in this case, it's a good idea to store user data in your database, sync it to Clerk, and serve it directly. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Clerk, you can only access the currently signed in user's data from Clerk's frontend API.
There's a logical confusion here -- I think what you intend to say is that if you're using the frontend api, you can only access the currently signed in user's data. But the way this reads is that the only way the current user's data can be accessed is with FAPI, which is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is the complication here of eventual consistency as mentioned above -- if a user signs up and makes a comment on a post, then someone else goes to look at the post before the webhook sync has completed, and your code has not handled the case where the user data may not yet be in your database, but the comment would be, it could cause an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I’d agree with Jeff on the first point here. I had a go at a potential rephrase to try to clarify that it’s the frontend API that’s limited to the signed-in user:
Clerk’s frontend API only allows you to access information about the currently signed-in user. If your app needs to display information about other users. like their names or avatars, you can’t access that data from the frontend API alone. While you can fetch other users’ data using Clerk’s backend API for each request, doing so can be slow and may hit rate limits. In this case, it’s a good idea to store user data in your own database, sync it to Clerk, and serve it directly to your frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Clerk, you can only access the currently signed in user's data from Clerk's frontend API.
There's a logical confusion here -- I think what you intend to say is that if you're using the frontend api, you can only access the currently signed in user's data. But the way this reads is that the only way the current user's data can be accessed is with FAPI, which is not true.
yes 🤦♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally agree with Jeff's points and Sarah's rewrite.
"doing so can be slow" - I don't like the use of slow here -- I think making a GET to Clerk's BAPI is likely as fast or faster than many application's queries to their own DB. I would just say that is requires a network request'
"and serve it directly to your frontend."
Most apps/databases aren't serving data directly to the app's frontend. Its usually frontend -> backend -> DB -> backend -> frontend. Some DBs (Firestore for example) can serve directly to a frontend.
I think that one thing that should be added here that FAPI also can provide information about the user's current active org (useOrganization()) and the org's the user is a member of/invited to/suggested to (useOrganization()). This is something commonly synced to a DB, for better or worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've made some significant changes regarding this info: 85d7648
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| Instead of syncing Clerk's data using webhooks, there are two other approaches you can take, depending on how much extra user data you need to store. | ||
|  | ||
| **If it's more than \~2KB,** you could store only custom user data in your own database. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the ~2kb constraint here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so everything that I wrote is taken from what you wrote: https://clerkinc.slack.com/archives/C084WHCNHCZ/p1750112692275719?thread_ts=1750109460.027299&cid=C084WHCNHCZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated 85d7648
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| **If it's more than \~2KB,** you could store only custom user data in your own database. | ||
|  | ||
| - Store the user's Clerk ID as a column in the users table in your own database, and only store extra user data. When you need to access Clerk user data, access it directly from the [Clerk session token](/docs/backend-requests/resources/session-tokens). When you need to access the extra user data, do a lookup in your database using the Clerk user ID. Consider indexing the Clerk user ID column since it will be used frequently for lookups. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to see an example here to drive it home
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could be a bit more extensive in explaining this approach. Answering the question of "how it is different to syncing Clerk's data with webhooks" - from my understanding, in this scenario, you do not duplicate or sync Clerk’s standard user data (like name, email, or profile picture) into your own database, but instead, you only store the custom user data that your application needs beyond what Clerk provides. I think potentially adding a sentence reinforcing that difference could help + adding an example too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the copy and added an example 85d7648
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments from me - looking really good overall but I think some areas need a bit more clarity and detail, especially around the alternative approaches to syncing data!
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| You should only sync Clerk data to your database when absolutely necessary. The most notable example is if your app has social features where users can see content posted by other users. | ||
|  | ||
| With Clerk, you can only access the currently signed in user's data from Clerk's frontend API. If you need to display a another user's name, avatar, etc., you can't access that data from Clerk's frontend API. You could use Clerk's backend API to fetch user data for each request, but it's slow and may result in rate limiting. So in this case, it's a good idea to store user data in your database, sync it to Clerk, and serve it directly. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I’d agree with Jeff on the first point here. I had a go at a potential rephrase to try to clarify that it’s the frontend API that’s limited to the signed-in user:
Clerk’s frontend API only allows you to access information about the currently signed-in user. If your app needs to display information about other users. like their names or avatars, you can’t access that data from the frontend API alone. While you can fetch other users’ data using Clerk’s backend API for each request, doing so can be slow and may hit rate limits. In this case, it’s a good idea to store user data in your own database, sync it to Clerk, and serve it directly to your frontend.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| **If it's more than \~2KB,** you could store only custom user data in your own database. | ||
|  | ||
| - Store the user's Clerk ID as a column in the users table in your own database, and only store extra user data. When you need to access Clerk user data, access it directly from the [Clerk session token](/docs/backend-requests/resources/session-tokens). When you need to access the extra user data, do a lookup in your database using the Clerk user ID. Consider indexing the Clerk user ID column since it will be used frequently for lookups. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could be a bit more extensive in explaining this approach. Answering the question of "how it is different to syncing Clerk's data with webhooks" - from my understanding, in this scenario, you do not duplicate or sync Clerk’s standard user data (like name, email, or profile picture) into your own database, but instead, you only store the custom user data that your application needs beyond what Clerk provides. I think potentially adding a sentence reinforcing that difference could help + adding an example too.
        
          
                docs/_partials/metadata-callout.mdx
              
                Outdated
          
        
      | @@ -0,0 +1,4 @@ | |||
| > [!WARNING] | |||
| > Metadata is limited to **8KB** maximum. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When just talking about a metadata field (any of unstable, public or private) on any of the various objects (user, organization membership, organization, organization invitation, app invitation and so on) that field has an 8kb limit. This is not related to the session claims or handshake.
eg, if I wanted to store data in the user object's publicMetadata and I was not adding the entire publicMetadata or parts of it to the session object, I could store 8kb of data.
The callout is correct, but not for the context of discussing metadata added to session objects and that's where it can get confusing as the following examples could all apply.
Example 1: Adding all of publicMetadata to the session w/ no custom claims
This is more or less what most of the discussion here is about and is the easiest math wise. In this case the total size of the publicMetadata plus other session claims can't exceed 4kb (cookie size) or about 2kb (handshake, at least until the work is done to shard that)
Example 2: Adding all of publicMetadata to the session w/ custom claims
This is the same as above, but with the added complexity of the other custom session claim(s). How large can each of those be for the application? Those all need to be factored into the math: session claims + publicMetadata + other session claim max sizes
Example 3: Adding specific fields from publicMetadata w/ or without custom sessions claims
Imagine that the application is storing several fields in publicMetada on the user object like so:
{
  onboardingComplete: true,
  birthday: "2000-01-01",
  country: "Canada",
  bio: "This is a bio -- imagine it is 6kb of written info"
}
The application can add one or two fields to the session claim, and not include large chunks of data:
{
	"onboardingComplete": "{{user.public_metadata.onboardingComplete}}",
}
With this the math is: session claims + onboardingComplete + any other custom session claims. This example demonstrates how an application can have data that is close to the 8kb max for publicMetadata but still use a select portion of this for the session claim.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| You should only sync Clerk data to your database when absolutely necessary. The most notable example is if your app has social features where users can see content posted by other users. | ||
|  | ||
| With Clerk, you can only access the currently signed in user's data from Clerk's frontend API. If you need to display a another user's name, avatar, etc., you can't access that data from Clerk's frontend API. You could use Clerk's backend API to fetch user data for each request, but it's slow and may result in rate limiting. So in this case, it's a good idea to store user data in your database, sync it to Clerk, and serve it directly. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally agree with Jeff's points and Sarah's rewrite.
"doing so can be slow" - I don't like the use of slow here -- I think making a GET to Clerk's BAPI is likely as fast or faster than many application's queries to their own DB. I would just say that is requires a network request'
"and serve it directly to your frontend."
Most apps/databases aren't serving data directly to the app's frontend. Its usually frontend -> backend -> DB -> backend -> frontend. Some DBs (Firestore for example) can serve directly to a frontend.
I think that one thing that should be added here that FAPI also can provide information about the user's current active org (useOrganization()) and the org's the user is a member of/invited to/suggested to (useOrganization()). This is something commonly synced to a DB, for better or worse.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| ### Alternative approaches | ||
|  | ||
| Instead of syncing Clerk's data using webhooks, there are two other approaches you can take, depending on how much extra user data you need to store. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth mentioning a common pattern we see support side -- a hybrid of the two. For various reason the customer might want to store some user information in the database because of how it be used/accessed but they don't need all/most of it.
eg, maybe they will commonly display a username to other user's of the application as part of comments on a post or a dashboard. In that case they can:
- 
store the username in their DB and likely join it on a query for all comments on a post from their DB (generally the easier way to work with this data) 
- 
query the DB for the post and comments, extract all userIds for the post and comments and separately use clerkClient.users.getUserList()with an array of the userIds to get username and then finally access the return from the DB query and the return from thegetUserList()query to render the post/comment with the username.
---- edited to add ----
Imagine they also just want to display the user's own username in the UI, so they want to avoid using currentUser(), getUser() or querying their own DB.
A hybrid approach would be to:
- 
include the usernamein the session, so when the user is in parts of the app where they are not being shown other usernames then this can be read from the session with no network requests
- 
save the userIdandusernameto their database for every user, so they can then query their db when they need a list of user names. This also allows them to have the query join data from the user table and other tables (postsandcommentsfrom the above examples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you clarify how this example combines the two approaches?
one approach is store extra user data in metadata, and the other approach is store extra user data in a separate DB
in the case you've given, the username is stored in the DB, but I'm not seeing what data is stored in the Clerk metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just didn't finish that comment. I add to the original and separated the new info with ---- edited to add ----
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| **If it's less than \~2KB,** you could use Clerk metadata sparingly. | ||
|  | ||
| - For minimal custom data (under \~2KB), you can use Clerk's [metadata](/docs/users/metadata) feature instead of dealing with a separate users table. However, if there's any chance that a user will ever have more than \~2KB of extra data, you should use the first approach. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that probably should be mentioned is that if the customer needs to be able to query the custom data then metadata is not an option currently. You can not query BAPI by metadata. Take my example earlier that included the user's birth date stored to publicMetadata -- if I wanted to find all users with a birth date for the current day I can do query for that from metadata. I need to have that data in my DB so I can perform the query there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated: 85d7648
        
          
                docs/_partials/metadata-callout.mdx
              
                Outdated
          
        
      | > [!WARNING] | ||
| > Metadata is limited to **8KB** maximum. | ||
| > | ||
| > If you use Clerk metadata and modify it server-side, the changes won't appear in the session token until the next refresh. To avoid race conditions, either force a JWT refresh after metadata changes or handle the delay in your application logic. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep a section like this and link to a longer guide about sessions, it still might be worth link to https://clerk.com/docs/hooks/use-user#reload-user-data in this callout for an example for those customers who just want to know how to force a JWT refresh. This would also link to getToken({ skipCache: true }) instead.
85d7648    to
    5b090aa      
    Compare
  
            
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| If you want to use webhooks to sync Clerk data because **you want to store extra data for the user**, consider the following approaches: | ||
|  | ||
| 1. **If it's less than 8KB,** you could use Clerk metadata. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, should we also recommend storing that metadata in the session token? that would make the constraint 1.2KB instead of 8KB.
and what is the benefit of storing the metadata in the session token? removing the need of a network request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, should we also recommend storing that metadata in the session token? that would make the constraint 1.2KB instead of 8KB.
Yes and no. It really depends on how and why and what they are adding to the session.
How: Probably not worth touching here, but basically are they using a JSON object with multiple key/value pairs or just one key and all the data.
Why: Customers will store information in metadata without caring about attaching it to sessions. Those customers only would need to care about the 8kb limit.
What: They could have 6kb of metadata total, but use the following to access just one key/value pair where the value is just true | false
{
	"onboardingComplete": "{{user.public_metadata.onboardingComplete}}",
}
However if they just do the following, then they are attaching the full metadata object and that should be 1.2kb total currently if their SDK uses handshake or something like 2.5 or 3kb (@jfoshee can either tell us or do the math) if the SDK is purely client side and there is never a handshake.
{
	"metadata": "{{user.public_metadata}}",
}
Confused yet? Its confusing even explaining it mostly because there is no simple answer. The answer requires a bunch of info first.
and what is the benefit of storing the metadata in the session token? removing the need of a network request?
Correct. If I need the value of onBoardingComplete in Middleware or a server component, etc, and I haven't attached it to a session then I need to get it with one of these methods:
- currentUser()or- getUser()via BAPI - network request and 1 request against rate limit (though our rate limits are now much much higher so this is far less of a concern than it was)
- have stored this in a DB and query the DB to get it - network request
- use userfromuseUser()client side - no network request, but not available in Middleware, server components, sever actions, API Routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the customer is only using foundational ClerkJS and not platform SDKs, and is therefore not getting handshake behavior, then their metadata would only experience 1 base64 encoding in the JWT. (Handshake payloads get double base64 encoded). So, the balance of space available is 0.75 ✕ (4096 - 1800 overhead) ≅ 1,722 bytes or 1.68KB.
Not a dramatic improvement.
(And just a reminder that this includes the property names and JSON syntax like "onboardingComplete:" ,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I wasn't thinking about this ClerkJS-only (no handshake) scenario clearly yesterday-- I forgot that the 1,800 overhead number includes overhead in the handshake payload from other cookies.
Looking at a sample bare bones __session cookie with no custom metadata, it is about 610 bytes (see below).
That leaves a balance of: 0.75 ✕ (4096 - 610 overhead) ≅ 2.5KB
That matches what @royanger was estimating. I apologize! I think I saw some references to 1.5KB; if that's for the ClerkJS-only case, they could be increased to 2.5KB. @alexisintech @jescalan
eyJhbGciOiJFZERTQSIsImNhdCI6ImNsX0I3ZDRQRDExMUFBQSIsImtpZCI6Imluc18yenZDbjVpTDNVOWRwVGVFNXJ1QnllZmp3ZHQiLCJ0eXAiOiJKV1QifQ.eyJhenAiOiJodHRwczovL2FwcC50bWNheHJ0enRlcW02ZmhhYnlyY2x0NGZmNjE5djI1cWF0bnQyenRtYy5jb20iLCJleHAiOjQ0OTg4NDg2MCwiZnZhIjpbMCwtMV0sImlhdCI6NDQ5ODg0ODAwLCJpc3MiOiJodHRwczovL2NsZXJrLnRtY2F4cnR6dGVxbTZmaGFieXJjbHQ0ZmY2MTl2MjVxYXRudDJ6dG1jLmNvbSIsIm5iZiI6NDQ5ODg0NzkwLCJzaWQiOiJzZXNzXzJ6dkNuOHh3U0QxbVlGZ21GWTM3bWlNODY1diIsInN0cyI6ImFjdGl2ZSIsInN1YiI6InVzZXJfMnp2Q242R3RUZnRlUUI3YmcyQnplSWphbEpWIiwidiI6Mn0.4AHjgWvVbpauDDRMynN0oJYKnCk12QIIp_9ZlAAuKtpPs_ymLagqrkLXZugsFDaVkUEHxpWIxKII_zj9V0Y4AA
{
  "alg": "EdDSA",
  "cat": "cl_B7d4PD111AAA",
  "kid": "ins_2zvCn5iL3U9dpTeE5ruByefjwdt",
  "typ": "JWT"
}.{
  "azp": "https://app.tmcaxrtzteqm6fhabyrclt4ff619v25qatnt2ztmc.com",
  "exp": 449884860,
  "fva": [
    0,
    -1
  ],
  "iat": 449884800,
  "iss": "https://clerk.tmcaxrtzteqm6fhabyrclt4ff619v25qatnt2ztmc.com",
  "nbf": 449884790,
  "sid": "sess_2zvCn8xwSD1mYFgmFY37miM865v",
  "sts": "active",
  "sub": "user_2zvCn6GtTfteQB7bg2BzeIjalJV",
  "v": 2
}.[Signature]
| This is I think the cleanest way to do it. You can also call  const { session } = useSession();
useEffect(() => {
  session?.getToken({ skipCache: true });
}, []); | 
| 
 Minor correct -- it is  Both will do a network request, but  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good at this point! Nice work refining the initial version ⭐
| ### Example | ||
|  | ||
| It's recommended to keep the total size of custom claims in the session token under 1.2KB. Therefore, it's recommended to move particularly large claims out of the session token and fetch them using a separate API call from your backend. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awkward phrasing with double "it's recommended"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ``` | ||
| </Tab> | ||
| </Tabs> | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding a note that if you are doing this call frequently, it's probably better to sync to your database, or better yet, not even store it in Clerk metadata in the first place, and just store it in the database under your own user table, but put in the clerk_id as a column so you can quickly look it up using the id returned from the JWT. Doing tons of backend API calls puts you at risk of hitting rate limits, and is also a lot slower than making a database query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The following examples demonstrate how to retrieve the authenticated user's ID using framework-specific auth helpers and how to use the Backend SDK's [`getUser()`](/docs/references/backend/user/get-user) method to get the [Backend `User` object](/docs/references/backend/types/backend-user). | ||
|  | ||
| <Tabs items={["Next.js", "Astro", "Express", "React Router", "Remix", "Tanstack React Start"]}> | ||
| {/* TODO: The following Tabs example is duplicated in the backend-requests/resources/session-tokens.mdx file. It cannot be a partial to be reused across both files because this example includes a partial and partials cannot include partials. Also, the text in each of these tabs is removed in the other file as its not relevant to that file's example. So keep these two Tabs examples in sync please. */} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't see any reason why partials shouldn't be able to include other partials, if we just have it recurse...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, but the text for the two <Tabs> usages are different, so we couldn't use a partial here anyways.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| ## When to sync Clerk data | ||
|  | ||
| Syncing data with webhooks can be a suitable approach for some applications, but it comes with important considerations. Webhook deliveries are not guaranteed and may occasionally fail due to problems like network issues, so your implementation should be prepared to handle retries and error scenarios. Additionally, syncing data via webhooks is generally [eventually consistent](https://en.wikipedia.org/wiki/Eventual_consistency), meaning there can be a delay between when a Clerk event (such as a user being created or updated) occurs and when the corresponding data is reflected in your database. If not managed carefully, this delay can introduce bugs and race conditions. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Syncing data with webhooks can be a suitable approach for some applications, but it comes with important considerations. Webhook deliveries are not guaranteed and may occasionally fail due to problems like network issues, so your implementation should be prepared to handle retries and error scenarios. Additionally, syncing data via webhooks is generally [eventually consistent](https://en.wikipedia.org/wiki/Eventual_consistency), meaning there can be a delay between when a Clerk event (such as a user being created or updated) occurs and when the corresponding data is reflected in your database. If not managed carefully, this delay can introduce bugs and race conditions. | |
| Syncing data with webhooks can be a suitable approach for some applications, but it comes with important considerations. Webhook deliveries are not guaranteed and may occasionally fail due to problems like network issues, so your implementation should be prepared to handle retries and error scenarios. Additionally, syncing data via webhooks is [eventually consistent](https://en.wikipedia.org/wiki/Eventual_consistency), meaning there can be a delay between when a Clerk event (such as a user being created or updated) occurs and when the corresponding data is reflected in your database. If not managed carefully, this delay can introduce bugs and race conditions. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| If you can access the necessary data directly from the [Clerk session token](/docs/backend-requests/resources/session-tokens), you can achieve strong consistency while avoiding the overhead of maintaining a separate user table in your own database and the latency of retrieving that data on every request. This makes not syncing data much more efficient, if your use case allows for it. | ||
|  | ||
| The most notable use case for syncing Clerk data is if your app has social features where users can see content posted by other users. This is because Clerk's frontend API only allows you to access information about the currently signed-in user. If your app needs to display information about other users, like their names or avatars, you can't access that data from the frontend API alone. While you can fetch other users' data using Clerk's backend API for each request, the issue is that you may hit rate limits. In this case, it makes sense to store user data in your own database and sync it to Clerk. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The most notable use case for syncing Clerk data is if your app has social features where users can see content posted by other users. This is because Clerk's frontend API only allows you to access information about the currently signed-in user. If your app needs to display information about other users, like their names or avatars, you can't access that data from the frontend API alone. While you can fetch other users' data using Clerk's backend API for each request, the issue is that you may hit rate limits. In this case, it makes sense to store user data in your own database and sync it to Clerk. | |
| The most notable use case for syncing Clerk data is if your app has social features where users can see content posted by other users. This is because Clerk's frontend API only allows you to access information about the currently signed-in user. If your app needs to display information about other users, like their names or avatars, you can't access that data from the frontend API alone. While you can fetch other users' data using Clerk's backend API for each request, this is slow compared to a database lookup, and you risk hitting [rate limits](/docs/backend-requests/resources/rate-limits). In this case, it makes sense to store user data in your own database and sync it from Clerk. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      |  | ||
| If you want to use webhooks to sync Clerk data because **you want to store extra data for the user**, consider the following approaches: | ||
|  | ||
| 1. **If it's less than 8KB,** you could use Clerk metadata. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **If it's less than 8KB,** you could use Clerk metadata. | |
| 1. **If it's less than 1.5KB,** you could use Clerk metadata. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      | If you want to use webhooks to sync Clerk data because **you want to store extra data for the user**, consider the following approaches: | ||
|  | ||
| 1. **If it's less than 8KB,** you could use Clerk metadata. | ||
| - For minimal custom data (under 8KB), you can store it in a user's [metadata](/docs/users/metadata) instead of dealing with a separate users table. However, if there's any chance that a user will ever have more than 8KB of extra data, you should use the other approach, as metadata is limited to 8KB. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - For minimal custom data (under 8KB), you can store it in a user's [metadata](/docs/users/metadata) instead of dealing with a separate users table. However, if there's any chance that a user will ever have more than 8KB of extra data, you should use the other approach, as metadata is limited to 8KB. | |
| - For minimal custom data (under 1.5KB), you can store it in a user's [metadata](/docs/users/metadata) instead of dealing with a separate users table. However, if there's any chance that a user will ever have more than 1.5KB of extra data, you should use the other approach, as you risk cookie size overflows if metadata is over 1.5KB. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
        
          
                docs/webhooks/sync-data.mdx
              
                Outdated
          
        
      | - For minimal custom data (under 8KB), you can store it in a user's [metadata](/docs/users/metadata) instead of dealing with a separate users table. However, if there's any chance that a user will ever have more than 8KB of extra data, you should use the other approach, as metadata is limited to 8KB. | ||
| - A limitation to consider is that metadata cannot be queried, so you can't use it to filter users by metadata. For example, if you stored a user's birthday in metadata, you couldn't find all users with a certain birthday. If you need to query the data that you're storing, you should use the other approach. | ||
|  | ||
| 1. **If it's more than 8KB,** you could store _only_ the extra user data in your own database. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **If it's more than 8KB,** you could store _only_ the extra user data in your own database. | |
| 1. **If it's more than 1.5KB,** you could store _only_ the extra user data in your own database. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @jescalan @royanger @SarahSoutoul I've added a guide on forcing a token refresh - would love some eyes on it ❤️ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautifully written guide on forcing the token to refresh! Have committed a fix to the linting issues + minor sentence phrasing 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge improvement for sure
| ## Size limitations | ||
|  | ||
| The Clerk session token is stored in a cookie. All modern browsers [limit the maximum size of a cookie to 4kb](https://datatracker.ietf.org/doc/html/rfc2109#section-6.3). Exceeding this limit can have adverse effects, including a possible infinite redirect loop for users who exceed this size in Next.js applications. | ||
| The Clerk session token is stored in a cookie. All modern browsers [limit the maximum size of a cookie to **4KB**](https://datatracker.ietf.org/doc/html/rfc2109#section-6.3). Exceeding this limit can have adverse effects, such as possible infinite redirect loops in Next.js applications. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceeding this limit will cause the cookie not to be set, which will guaranteed break your app since Clerk relies on cookies to function.
| The Clerk session token is stored in a cookie. All modern browsers [limit the maximum size of a cookie to **4KB**](https://datatracker.ietf.org/doc/html/rfc2109#section-6.3). Exceeding this limit can have adverse effects, such as possible infinite redirect loops in Next.js applications. | ||
|  | ||
| A session token with the [default session claims](#default-claims) won't run into this issue, as this configuration produces a cookie significantly smaller than 4kb. However, this limitation becomes relevant when implementing a [custom session token](/docs/backend-requests/custom-session-token). In this case, it's recommended to move particularly large claims out of the token and fetch these using a separate API call from your backend. | ||
| A session token with the [default session claims](#default-claims) won't run into this issue, as this configuration produces a cookie significantly smaller than 4KB. However, this limitation becomes relevant when implementing a [custom session token](/docs/backend-requests/custom-session-token). **It's recommended to keep the total size of custom claims in the session token under 1.2KB.** | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotta explain why. I'd read this and be like, this makes no sense, you just said the limit is 4kb, so why keep it under 1.2kb? The answer generally is "clerk adds metadata in addition to the payload that reduces the available space".
Also, Foshee might have already shipped a change that makes this not the case and removes the limit, since it will just set a second cookie if we're over the limit. You should check with him and see. If this is the case, we can change the copy to be more along the lines of "you should keep your custom claims size as low as possible because the more you add, the slower your app becomes since cookies are shuttled back and forth with every request between FE and BE. We recommend 1.2kb max to keep your performance optimal"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still recommend keeping the 1.2KB advice as that limit should keep users on the 1-step handshake flow.
Even with the 2-step handshake support we still have a limit of ~ 2.5KB for custom session data in session cookies. (see this comment above).
This is due simply to the fact that the JWT session is confined to a single cookie. This ~2.5KB will be a "hard" limit until & if we implement chunking of session cookies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's included in the linked /custom-session-token guide at the top, but I can add context here as well!
| </Tab> | ||
| </Tabs> | ||
|  | ||
| However, if you make this call to Clerk's Backend API frequently, you risk hitting [rate limits](/docs/backend-requests/resources/rate-limits) and it's also slower than making a database query. So it's recommended to [store the extra data in your own database](/docs/webhooks/sync-data#storing-extra-user-data) instead of storing it in metadata in the session token. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to recommend making a BAPI call, put a bunch of examples of how to do it, then immediately after being like "actually, we recommend not doing this". I feel like I'd flip these around -- explicitly recommend storing larger user data in your own database, then mention that if there's some restriction that makes this impossible, it is possible to fetch via BAPI, but risky because of rate limits, so we do not recommend this.
| ### Storing extra user data | ||
|  | ||
| If you want to use webhooks to sync Clerk data because **you want to store extra data for the user**, consider the following approaches: | ||
|  | ||
| 1. **If it's less than 1.2KB,** you could use Clerk metadata and store it in the user's session token. | ||
| - For minimal custom data (under 1.2KB), you can store it in a user's [metadata](/docs/users/metadata) instead of dealing with a separate users table. Then, you can [store the metadata in the user's session token](/docs/backend-requests/custom-session-token) to avoid making a network request to Clerk's Backend API when retrieving it. However, if there's any chance that a user will ever have more than 1.2KB of extra data, you should use the other approach, as you risk cookie size overflows if metadata is over 1.2KB. | ||
| - A limitation to consider is that metadata cannot be queried, so you can't use it to filter users by metadata. For example, if you stored a user's birthday in metadata, you couldn't find all users with a certain birthday. If you need to query the data that you're storing, you should use the other approach. | ||
|  | ||
| 1. **If it's more than 1.2KB,** you could store _only_ the extra user data in your own database. | ||
| - Store the user's Clerk ID as a column in the users table in your own database, and only store extra user data. When you need to access Clerk user data, access it directly from the [Clerk session token](/docs/backend-requests/resources/session-tokens). When you need to access the extra user data, do a lookup in your database using the Clerk user ID. Consider indexing the Clerk user ID column since it will be used frequently for lookups. | ||
| - For example, Clerk doesn't collect a user's birthday, country, or bio, but if you wanted to collect these fields, you could store them in your own database like this: | ||
| | id | clerk\_id | birthday | country | bio | | ||
| | - | - | - | - | - | | ||
| | user123abc | user\_123 | 1990-05-12 | USA | Coffee enthusiast. | | ||
| | user456abc | user\_456 | 1985-11-23 | Canada | Loves to read. | | ||
| | user789abc | user\_789 | 2001-07-04 | Germany | Student and coder. | | ||
|  | ||
| 1. A hybrid approach of the two approaches above. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is written from the stance that the customer will be attaching their entire public metadata object to the session. Writing from that stance is then misleading for customers who:
- don't attach public metadata to the session (they can store up to 8kb in that case)
- they want to store up to 8kb in public metadata, but only attach one key/value pair of that to the session (the Clerk app uses this pattern)
- want to store data in private metadata (can't attach to the session, so can use up to 8kb)
- unsafe metadata (same size limitations if attaching to the session, us unsafe, can be 8kb of data is not attaching to the session)
I strongly believe that the 'When to sync Clerk data' section should be broken out into its own guide as there is lots to cover if we want to provide a full and complete picture to customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very fair point and i've created a ticket for it https://linear.app/clerk/issue/DOCS-10682/expand-on-how-a-user-can-extend-clerk-data
but i want to go ahead and ship this ahead of the current IA project (trying to close out PR's before that gets shipped). so this will have to get revisited at a later time
90ad3fd    to
    5a56db0      
    Compare
  
    
🔎 Previews:
What does this solve?
https://linear.app/clerk/issue/DOCS-10525/add-context-on-why-we-dont-recommend-webhooksh
We don't recommend syncing Clerk's user table with an external user table using webhooks.
What changed?
This PR adds context around why, and gives alternative approaches to using webhooks.
Checklist