feat: compress replay data #1436

pauldambra · 2024-09-24T19:54:18Z

We want to compress replay payloads since they're sometimes massive and very compressible

We do compress the entire payload when it is travelling over the network and when in storage, but at points in processing it is not compressed and we can leave some data compressed to make life easier on our infra

But blob ingestion needs to read some metadata from some events so we don't want to just compress everything

So, we now support posthog-js partially compressing selected payloads

This only compresses when it is enabled so we can test this ourselves to see what the what

pairs with PostHog/posthog#25183

vercel · 2024-09-24T19:54:22Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
posthog-js	✅ Ready (Inspect)	Visit Preview	Sep 24, 2024 7:57pm

github-actions · 2024-09-24T19:58:18Z

Size Change: +11.2 kB (+0.4%)

Total Size: 2.8 MB

Filename	Size	Change
`dist/array.full.js`	351 kB	+1.24 kB (+0.35%)
`dist/array.full.no-external.js`	350 kB	+1.24 kB (+0.36%)
`dist/array.js`	166 kB	+1.25 kB (+0.76%)
`dist/array.no-external.js`	165 kB	+1.25 kB (+0.77%)
`dist/main.js`	167 kB	+1.25 kB (+0.76%)
`dist/module.full.js`	351 kB	+1.24 kB (+0.35%)
`dist/module.full.no-external.js`	350 kB	+1.24 kB (+0.36%)
`dist/module.js`	166 kB	+1.25 kB (+0.76%)
`dist/module.no-external.js`	165 kB	+1.25 kB (+0.77%)

ℹ️ View Unchanged

Filename	Size
`dist/all-external-dependencies.js`	191 kB
`dist/exception-autocapture.js`	10.5 kB
`dist/external-scripts-loader.js`	2.35 kB
`dist/recorder-v2.js`	111 kB
`dist/recorder.js`	111 kB
`dist/surveys-preview.js`	59.8 kB
`dist/surveys.js`	66 kB
`dist/tracing-headers.js`	8.36 kB
`dist/web-vitals.js`	10.3 kB

_{compressed-size-action}

richard-better · 2024-09-25T10:02:24Z

src/extensions/replay/sessionrecording.ts

+function gzipToString(data: unknown): string {
+    return strFromU8(gzipSync(strToU8(JSON.stringify(data))), true)
+}


Have you considered streaming, for at least the full snapshots? This seems memory intensive to do the Object->String->Array->Gzip Array->String transformation synchronously

(Though I imagine it would need a bigger refactor to the network code)

Streaming in the browser? 🤔 certainly not a thing I've ever heard of?

It's new-ish (at least for sending, reading the response as a stream has been available for a while): https://developer.chrome.com/docs/capabilities/web-apis/fetch-streaming-requests#using_with_writable_streams

we'd not... that's interesting.
certainly we've talked about having a fast path for full snapshots at least since if all we capture are full snapshots we can play something back

benjackwhite

Couldn't help but comment on an interesting PR...

benjackwhite · 2024-09-25T10:09:08Z

src/extensions/replay/sessionrecording.ts

+function gzipToString(data: unknown): string {
+    return strFromU8(gzipSync(strToU8(JSON.stringify(data))), true)
+}


Streaming in the browser? 🤔 certainly not a thing I've ever heard of?

benjackwhite · 2024-09-25T10:12:15Z

src/extensions/replay/sessionrecording.ts

+                ...event,
+                cv: '2024-10',
+                data: {
+                    ...event.data,


Not sure what gains this really gives.
Why not pull out from the data any counters you need and then just compress the entire object? In theory it would never need to be decompressed and can then be written straight to the S3 file for example and then decompressed at read time.
Or at least it would be a simple single decompression of the whole payload rather than multiple ones?

so... we know that when compressed the data fits into kafka but when uncompressed it doesn't always (and we reject it unnecessarily)

when I looked at pulling the metadata out (and it's fair I don't have this anywhere but in my head)

i at least had to change the SDK, django capture, blobby and maybe playback depending on how it was implemented

and we'd still have multiple compress/decompress steps since we're doing that at network boundaries anyway

this way i can test the impact with a relatively small change

what gains this really gives.

it should mean that the data that is currently not making into kafka from rust or django capture does make it in
without having to change anything else
so i can probe if that's (the last|a) piece of the puzzle for playback issues

#BiasForImpact :p

pauldambra added 3 commits September 24, 2024 18:20

feat: compress replay data

de679ad

follow the tests

7173f75

feat: partially compress replay data

5b87425

pauldambra added the bump minor Bump minor version when this PR gets merged label Sep 24, 2024

pauldambra requested a review from a team September 24, 2024 19:56

vercel bot deployed to Preview September 24, 2024 19:57 View deployment

pauldambra mentioned this pull request Sep 24, 2024

feat: support partial payload compression PostHog/posthog#25183

Merged

marandaneto approved these changes Sep 25, 2024

View reviewed changes

richard-better reviewed Sep 25, 2024

View reviewed changes

benjackwhite reviewed Sep 25, 2024

View reviewed changes

pauldambra merged commit e3bd4e1 into main Sep 25, 2024
20 checks passed

pauldambra deleted the feat/compress-replay-data branch September 25, 2024 11:15

This was referenced Oct 18, 2024

[Snyk] Upgrade posthog-js from 1.10.2 to 1.165.0 Bad3r/Logseq#842

Open

[Snyk] Upgrade posthog-js from 1.10.2 to 1.165.1 Bad3r/Logseq#845

Open

[Snyk] Upgrade posthog-js from 1.10.2 to 1.166.1 Bad3r/Logseq#847

Open

This was referenced Oct 26, 2024

[Snyk] Upgrade posthog-js from 1.130.2 to 1.166.1 nerds-github/agentops#3

Open

[Snyk] Upgrade posthog-js from 1.130.2 to 1.166.1 nerds-github/stract#4

Open

This was referenced Oct 29, 2024

[Snyk] Upgrade posthog-js from 1.10.2 to 1.166.2 Bad3r/Logseq#850

Open

[Snyk] Upgrade posthog-js from 1.10.2 to 1.167.0 Bad3r/Logseq#851

Open

This was referenced Nov 5, 2024

[Snyk] Upgrade posthog-js from 1.10.2 to 1.167.1 Bad3r/Logseq#853

Open

[Snyk] Upgrade posthog-js from 1.10.2 to 1.169.0 Bad3r/Logseq#856

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: compress replay data #1436

feat: compress replay data #1436

pauldambra commented Sep 24, 2024 •

edited

Loading

vercel bot commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024

richard-better Sep 25, 2024 •

edited

Loading

benjackwhite Sep 25, 2024

richard-better Sep 25, 2024

pauldambra Sep 25, 2024

benjackwhite left a comment

benjackwhite Sep 25, 2024

benjackwhite Sep 25, 2024

pauldambra Sep 25, 2024

pauldambra Sep 25, 2024 •

edited

Loading

pauldambra Sep 25, 2024

feat: compress replay data #1436

feat: compress replay data #1436

Conversation

pauldambra commented Sep 24, 2024 • edited Loading

vercel bot commented Sep 24, 2024 • edited Loading

github-actions bot commented Sep 24, 2024

richard-better Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

benjackwhite Sep 25, 2024

Choose a reason for hiding this comment

richard-better Sep 25, 2024

Choose a reason for hiding this comment

pauldambra Sep 25, 2024

Choose a reason for hiding this comment

benjackwhite left a comment

Choose a reason for hiding this comment

benjackwhite Sep 25, 2024

Choose a reason for hiding this comment

benjackwhite Sep 25, 2024

Choose a reason for hiding this comment

pauldambra Sep 25, 2024

Choose a reason for hiding this comment

pauldambra Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

pauldambra Sep 25, 2024

Choose a reason for hiding this comment

pauldambra commented Sep 24, 2024 •

edited

Loading

vercel bot commented Sep 24, 2024 •

edited

Loading

richard-better Sep 25, 2024 •

edited

Loading

pauldambra Sep 25, 2024 •

edited

Loading