Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compress replay data #1436

Merged
merged 3 commits into from
Sep 25, 2024
Merged

feat: compress replay data #1436

merged 3 commits into from
Sep 25, 2024

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Sep 24, 2024

We want to compress replay payloads since they're sometimes massive and very compressible

We do compress the entire payload when it is travelling over the network and when in storage, but at points in processing it is not compressed and we can leave some data compressed to make life easier on our infra

But blob ingestion needs to read some metadata from some events so we don't want to just compress everything

So, we now support posthog-js partially compressing selected payloads

This only compresses when it is enabled so we can test this ourselves to see what the what

pairs with PostHog/posthog#25183

Copy link

vercel bot commented Sep 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
posthog-js ✅ Ready (Inspect) Visit Preview Sep 24, 2024 7:57pm

Copy link

Size Change: +11.2 kB (+0.4%)

Total Size: 2.8 MB

Filename Size Change
dist/array.full.js 351 kB +1.24 kB (+0.35%)
dist/array.full.no-external.js 350 kB +1.24 kB (+0.36%)
dist/array.js 166 kB +1.25 kB (+0.76%)
dist/array.no-external.js 165 kB +1.25 kB (+0.77%)
dist/main.js 167 kB +1.25 kB (+0.76%)
dist/module.full.js 351 kB +1.24 kB (+0.35%)
dist/module.full.no-external.js 350 kB +1.24 kB (+0.36%)
dist/module.js 166 kB +1.25 kB (+0.76%)
dist/module.no-external.js 165 kB +1.25 kB (+0.77%)
ℹ️ View Unchanged
Filename Size
dist/all-external-dependencies.js 191 kB
dist/exception-autocapture.js 10.5 kB
dist/external-scripts-loader.js 2.35 kB
dist/recorder-v2.js 111 kB
dist/recorder.js 111 kB
dist/surveys-preview.js 59.8 kB
dist/surveys.js 66 kB
dist/tracing-headers.js 8.36 kB
dist/web-vitals.js 10.3 kB

compressed-size-action

Comment on lines +132 to +134
function gzipToString(data: unknown): string {
return strFromU8(gzipSync(strToU8(JSON.stringify(data))), true)
}
Copy link
Contributor

@richard-better richard-better Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered streaming, for at least the full snapshots? This seems memory intensive to do the Object->String->Array->Gzip Array->String transformation synchronously

(Though I imagine it would need a bigger refactor to the network code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming in the browser? 🤔 certainly not a thing I've ever heard of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's new-ish (at least for sending, reading the response as a stream has been available for a while): https://developer.chrome.com/docs/capabilities/web-apis/fetch-streaming-requests#using_with_writable_streams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd not... that's interesting.
certainly we've talked about having a fast path for full snapshots at least since if all we capture are full snapshots we can play something back

Copy link
Collaborator

@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't help but comment on an interesting PR...

Comment on lines +132 to +134
function gzipToString(data: unknown): string {
return strFromU8(gzipSync(strToU8(JSON.stringify(data))), true)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming in the browser? 🤔 certainly not a thing I've ever heard of?

...event,
cv: '2024-10',
data: {
...event.data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what gains this really gives.
Why not pull out from the data any counters you need and then just compress the entire object? In theory it would never need to be decompressed and can then be written straight to the S3 file for example and then decompressed at read time.
Or at least it would be a simple single decompression of the whole payload rather than multiple ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... we know that when compressed the data fits into kafka but when uncompressed it doesn't always (and we reject it unnecessarily)

when I looked at pulling the metadata out (and it's fair I don't have this anywhere but in my head)

i at least had to change the SDK, django capture, blobby and maybe playback depending on how it was implemented

and we'd still have multiple compress/decompress steps since we're doing that at network boundaries anyway

this way i can test the impact with a relatively small change

Copy link
Member Author

@pauldambra pauldambra Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what gains this really gives.

it should mean that the data that is currently not making into kafka from rust or django capture does make it in
without having to change anything else
so i can probe if that's (the last|a) piece of the puzzle for playback issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#BiasForImpact :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump minor Bump minor version when this PR gets merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants