-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(replays): initial replays clickhouse migration #2681
Conversation
1ddbd25
to
599bbbe
Compare
599bbbe
to
203aeee
Compare
This PR has a migration; here is the generated SQL -- start migrations
-- migration replays : 0001_replays
Local operations:
CREATE TABLE IF NOT EXISTS replays_local (replay_id UUID, sequence_id UInt16, trace_ids Array(UUID), _trace_ids_hashed UInt64 MATERIALIZED arrayMap(t -> cityHash64(t), trace_ids), title String, project_id UInt64, timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release Nullable(String), dist Nullable(String), ip_address_v4 Nullable(IPv4), ip_address_v6 Nullable(IPv6), user String, user_hash UInt64, user_id Nullable(String), user_name Nullable(String), user_email Nullable(String), sdk_name String, sdk_version String, tags Nested(key String, value String), retention_days UInt16, partition UInt16, offset UInt64) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/replays/{shard}/default/replays_local', '{replica}') ORDER BY (project_id, toStartOfDay(timestamp), cityHash64(replay_id), sequence_id) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
ALTER TABLE replays_local ADD INDEX IF NOT EXISTS bf_trace_ids_hashed _trace_ids_hashed TYPE bloom_filter() GRANULARITY 1;
Dist operations:
CREATE TABLE IF NOT EXISTS replays_dist (replay_id UUID, sequence_id UInt16, trace_ids Array(UUID), _trace_ids_hashed UInt64 MATERIALIZED arrayMap(t -> cityHash64(t), trace_ids), title String, project_id UInt64, timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release Nullable(String), dist Nullable(String), ip_address_v4 Nullable(IPv4), ip_address_v6 Nullable(IPv6), user String, user_hash UInt64, user_id Nullable(String), user_name Nullable(String), user_email Nullable(String), sdk_name String, sdk_version String, tags Nested(key String, value String), retention_days UInt16, partition UInt16, offset UInt64) ENGINE Distributed(cluster_one_sh, default, replays_local, cityHash64(toString(replay_id)));
-- end migration replays : 0001_replays |
|
||
raw_columns: Sequence[Column[Modifiers]] = [ | ||
Column("replay_id", UUID()), | ||
Column("sequence_id", UInt(16)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who generates the sequence_id? On the SDK? Sentry? Snuba? What's the max number allowed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the SDK will generate the sequence id. max number will be between ~100 and ~1000. (we will be capping replays max length time-wise and from there find a sane max seq_id).
columns=raw_columns, | ||
engine=table_engines.ReplacingMergeTree( | ||
storage_set=StorageSetKey.REPLAYS, | ||
order_by="(project_id, toStartOfDay(timestamp), cityHash64(replay_id), sequence_id)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, items with the same replay_id can still span multiple days right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. i'm glad you brought this up. the intention is that: replays which span across multiple days will only show up when the initial event is within the time range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably an edge case, but if we do receive replays on different days with the same replay_id and sequence_id they will not get merged together and we'd need a strategy to deduplicate them when querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshFerge Could you please provide some insights of the most common query pattern you expect?
What are you going to filter by most times?
What are you going to aggregate, if anything ?
The order by key has to be defined based on the expected query pattern. You cannot change it once done without rebuilding the table entirely and getting it wrong will make your query performance miserable.
Also the expected query pattern impacts which (if any) data skipping indexes should be added. We cannot add indexes to all columns as the type of index depend on the query you want to make faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.notion.so/sentry/Addendum-Replay-Queries-fcfd8e68679e443e87649014cf10ae62
see above ^^. We will be happy to rebuild the table entirely while we are testing for the next several months, so viewing all data as temporary and will make that clear to any customers testing. we will be building several use cases on top of this initial that may require us to re-build the tables at any rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: the document you linked:
"As a replays user, I want to see all replays where an error occurred"
The table designed here does not seem to have a reference to an issue or an error. Is that correct or a mistake ?
"As a performance user, I want to see if this trace has a replay associated with it"
If you want to search the replays table WHERE has(trace_ids, 'asdasdasdasd')
please add a bloom filter index on that column (which may require you to create a materialized version with hashes of that column). Otherwise your search will be miserable.
But it would be better to add a replay id on the transaction in some way so you do not have to scan the whole replay table to associate replays to traces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table designed here does not seem to have a reference to an issue or an error. Is that correct or a mistake ?
Not a mistake. For now, we will do very rudimentary query where we take the trace ids from a single page and look them up to determine if there is an error associated, or do a search on the errors table looking for the replays tag.
The issue is that since errors can be sampled / dropped, (and replays too in the future), tagging each other's events with the ids is problematic because it's not guaranteed that the tagged id will exist.
we'll likely need some separate table that's generated in event post_processing that can accurately associate ingested events with replays. this will come in a future iteration.
But it would be better to add a replay id on the transaction in some way so you do not have to scan the whole replay table to associate replays to traces.
we'll also be adding replay_id on other events, so for example this search can use transactions tagged with a replay id. (there is still the sampling problem, but not going to worry about this in first iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for now will not add bloom filter index, will add TODO. something I can follow up on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for now will not add bloom filter index, will add TODO. something I can follow up on.
Please do not wait on this. Not doing that means a full table scan each time and the effort to add the index is minimal. Clickhouse tables get large very quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went ahead and added the index 👍🏼
raw_columns: Sequence[Column[Modifiers]] = [ | ||
Column("replay_id", UUID()), | ||
Column("sequence_id", UInt(16)), | ||
Column("trace_ids", Array(UUID())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these trace_ids? Is it supposed to be a pointer to some other piece of data in one of our systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getsentry/sentry-replay#38 (comment)
replays can have N trace_ids, and each update may have N of them, and trace_id will be the link between them to start (we won't be doing any joins with them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you ever get a sense of how many trace IDs could be in this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a per row basis it likely won't be more than 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a document I can read on the use cases replays is trying to solve? More specifically what sort of queries would be run against this dataset.
|
||
raw_columns: Sequence[Column[Modifiers]] = [ | ||
Column("replay_id", UUID()), | ||
Column("sequence_id", UInt(16)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any sort of relation between sequence_id
and replay_id
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequence_id
will be a monotonically increasing counter, so for each replay_id
, sequence_id
is unique.
# sdk info | ||
Column("sdk_name", String()), | ||
Column("sdk_version", String()), | ||
Column("tags", Nested([("key", String()), ("value", String())])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance reasons, you might want to add bloom filter index on tags as we do on some of our other datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call 👍🏼 will look at adding those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you actually going to search for replays by tag key/value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely
columns=raw_columns, | ||
engine=table_engines.Distributed( | ||
local_table_name="replays_local", | ||
sharding_key="project_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for sharding the data by project_id
versus sharding randomly? One disadvantage I can see with sharding by `project_id is that if there is a big project which uses replays a lot, the shards could become imbalanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I think i just chose this arbitrarily. I'll shard by replay_id
instead.
|
Codecov Report
@@ Coverage Diff @@
## master #2681 +/- ##
==========================================
- Coverage 92.81% 92.77% -0.05%
==========================================
Files 609 612 +3
Lines 28606 28662 +56
==========================================
+ Hits 26552 26591 +39
- Misses 2054 2071 +17
Continue to review full report at Codecov.
|
# sdk info | ||
Column("sdk_name", String()), | ||
Column("sdk_version", String()), | ||
Column("tags", Nested([("key", String()), ("value", String())])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you actually going to search for replays by tag key/value ?
storage_set=StorageSetKey.REPLAYS, | ||
table_name="replays_local", | ||
columns=raw_columns, | ||
engine=table_engines.ReplacingMergeTree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to use the replacing feature for something (aside for removing duplicates, which is anyway a good idea) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, just removing duplicates.
columns=raw_columns, | ||
engine=table_engines.ReplacingMergeTree( | ||
storage_set=StorageSetKey.REPLAYS, | ||
order_by="(project_id, toStartOfDay(timestamp), cityHash64(replay_id), sequence_id)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshFerge Could you please provide some insights of the most common query pattern you expect?
What are you going to filter by most times?
What are you going to aggregate, if anything ?
The order by key has to be defined based on the expected query pattern. You cannot change it once done without rebuilding the table entirely and getting it wrong will make your query performance miserable.
Also the expected query pattern impacts which (if any) data skipping indexes should be added. We cannot add indexes to all columns as the type of index depend on the query you want to make faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the type errors
fixed, accidentally included an errant file which caused the errors. |
will merge monday. thanks for reviews all. |
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.12 to 2.2.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.2.2</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p> <h2>2.2.1</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3331">#3331</a>)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3343">#3343</a>)</li> <li>Changed <code>ProtocolError</code> to <code>InvalidChunkLength</code> when response terminates before the chunk length is sent. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2860">#2860</a>)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3261">#3261</a>)</li> </ul> <h2>2.2.0</h2> <h2>🖥️ urllib3 now works in the browser</h2> <p>:tada: <strong>This release adds experimental support for <a href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">using urllib3 in the browser with Pyodide</a>!</strong> 🎉</p> <p>Thanks to Joe Marshall (<a href="https://github.com/joemarshall"><code>@joemarshall</code></a>) for contributing this feature. This change was possible thanks to work done in urllib3 v2.0 to detach our API from <code>http.client</code>. Please report all bugs to the <a href="https://github.com/urllib3/urllib3/issues">urllib3 issue tracker</a>.</p> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added support for <a href="https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html">Emscripten and Pyodide</a>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2951">#2951</a>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3186">#3186</a>)</li> <li>Added rudimentary support for HTTP/2. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3284">#3284</a>)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2244">#2244</a>)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3130">#3130</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.2.2 (2024-06-17)</h1> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<code>[#3122](urllib3/urllib3#3122) <https://github.com/urllib3/urllib3/issues/3122></code>__)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<code>[#3363](urllib3/urllib3#3363) <https://github.com/urllib3/urllib3/issues/3363></code>__)</li> </ul> <h1>2.2.1 (2024-02-16)</h1> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<code>[#3331](urllib3/urllib3#3331) <https://github.com/urllib3/urllib3/issues/3331></code>__)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<code>[#3343](urllib3/urllib3#3343) <https://github.com/urllib3/urllib3/issues/3343></code>__)</li> <li>Changed <code>InvalidChunkLength</code> to <code>ProtocolError</code> when response terminates before the chunk length is sent. (<code>[#2860](urllib3/urllib3#2860) <https://github.com/urllib3/urllib3/issues/2860></code>__)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<code>[#3261](urllib3/urllib3#3261) <https://github.com/urllib3/urllib3/issues/3261></code>__)</li> </ul> <h1>2.2.0 (2024-01-30)</h1> <ul> <li>Added support for <code>Emscripten and Pyodide <https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html></code><strong>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<code>[#2951](urllib3/urllib3#2951) <https://github.com/urllib3/urllib3/issues/2951></code></strong>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<code>[#3186](urllib3/urllib3#3186) <https://github.com/urllib3/urllib3/issues/3186></code>__)</li> <li>Added rudimentary support for HTTP/2. (<code>[#3284](urllib3/urllib3#3284) <https://github.com/urllib3/urllib3/issues/3284></code>__)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<code>[#2244](urllib3/urllib3#2244) <https://github.com/urllib3/urllib3/issues/2244></code>__)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<code>[#3130](urllib3/urllib3#3130) <https://github.com/urllib3/urllib3/issues/3130></code>__)</li> <li>Fixed an issue where <code>headers</code> passed in a request with <code>json=</code> would be mutated (<code>[#3203](urllib3/urllib3#3203) <https://github.com/urllib3/urllib3/issues/3203></code>__)</li> <li>Fixed <code>HTTPSConnection.is_verified</code> to be set to <code>False</code> when connecting from a HTTPS proxy to an HTTP target. It was set to <code>True</code> previously. (<code>[#3267](urllib3/urllib3#3267) <https://github.com/urllib3/urllib3/issues/3267></code>__)</li> <li>Fixed handling of new error message from OpenSSL 3.2.0 when configuring an HTTP proxy as HTTPS (<code>[#3268](urllib3/urllib3#3268) <https://github.com/urllib3/urllib3/issues/3268></code>__)</li> <li>Fixed TLS 1.3 post-handshake auth when the server certificate validation is disabled (<code>[#3325](urllib3/urllib3#3325) <https://github.com/urllib3/urllib3/issues/3325></code>__)</li> <li>Note for downstream distributors: To run integration tests, you now need to run the tests a second time with the <code>--integration</code> pytest flag. (<code>[#3181](urllib3/urllib3#3181) <https://github.com/urllib3/urllib3/issues/3181></code>__)</li> </ul> <h1>2.1.0 (2023-11-13)</h1> <ul> <li>Removed support for the deprecated urllib3[secure] extra. (<code>[#2680](urllib3/urllib3#2680) <https://github.com/urllib3/urllib3/issues/2680></code>__)</li> <li>Removed support for the deprecated SecureTransport TLS implementation. (<code>[#2681](urllib3/urllib3#2681) <https://github.com/urllib3/urllib3/issues/2681></code>__)</li> <li>Removed support for the end-of-life Python 3.7. (<code>[#3143](urllib3/urllib3#3143) <https://github.com/urllib3/urllib3/issues/3143></code>__)</li> <li>Allowed loading CA certificates from memory for proxies. (<code>[#3065](urllib3/urllib3#3065) <https://github.com/urllib3/urllib3/issues/3065></code>__)</li> <li>Fixed decoding Gzip-encoded responses which specified <code>x-gzip</code> content-encoding. (<code>[#3174](urllib3/urllib3#3174) <https://github.com/urllib3/urllib3/issues/3174></code>__)</li> </ul> <h1>2.0.7 (2023-10-17)</h1> <ul> <li>Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other" redirect responses.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/urllib3/urllib3/commit/27e2a5c5a7ab6a517252cc8dcef3ffa6ffb8f61a"><code>27e2a5c</code></a> Release 2.2.2 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/accff72ecc2f6cf5a76d9570198a93ac7c90270e"><code>accff72</code></a> Merge pull request from GHSA-34jh-p97f-mpxf</li> <li><a href="https://github.com/urllib3/urllib3/commit/34be4a57e59eb7365bcc37d52e9f8271b5b8d0d3"><code>34be4a5</code></a> Pin CFFI to a new release candidate instead of a Git commit (<a href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/da410581b6b3df73da976b5ce5eb20a4bd030437"><code>da41058</code></a> Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b07a669bd970d69847801148286b726f0570b625"><code>b07a669</code></a> Bump github/codeql-action from 2.13.4 to 3.25.6 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b8589ec9f8c4da91511e601b632ac06af7e7c10e"><code>b8589ec</code></a> Measure coverage with v4 of artifact actions (<a href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/f3bdc5585111429e22c81b5fb26c3ec164d98b81"><code>f3bdc55</code></a> Allow triggering CI manually (<a href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/52392654b30183129cf3ec06010306f517d9c146"><code>5239265</code></a> Fix HTTP version in debug log (<a href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b34619f94ece0c40e691a5aaf1304953d88089de"><code>b34619f</code></a> Bump actions/checkout to 4.1.4 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/9961d14de7c920091d42d42ed76d5d479b80064d"><code>9961d14</code></a> Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li> <li>Additional commits viewable in <a href="https://github.com/urllib3/urllib3/compare/1.26.12...2.2.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.12&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/getsentry/snuba/network/alerts). </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Onkar Deshpande <onkar.deshpande@sentry.io>
Summary
Creates the initial clickhouse migration for the replays dataset. Rows be uniquely added by their
replay_id
(many rows will have same replay_id) along with a monotonically increasingsequence_id
that represents each additional piece of data for the replay.This table will be used initially for simply searching and listing replays on the replays index page. basic aggregations will be done to gather the duration of the replay, and the list of trace_ids associated with the replay to start.
further columns will likely be introduced as product requirements determine the types of searches we want to do.
See the spec here for more information. https://www.notion.so/sentry/Session-Replay-V1-alpha-Ingest-Backend-ae068d1e1d514221b6c3ea2233f360f4