Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: count replay events in ClickHouse as we ingest them #16994

Merged
merged 30 commits into from
Sep 14, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ff8b2e9
feat: count replay events in ClickHouse as we ingest them
pauldambra Aug 10, 2023
3e8af12
Add to hogql db schema
pauldambra Aug 10, 2023
bee57bb
Update query snapshots
github-actions[bot] Aug 10, 2023
fe2a25a
Update query snapshots
github-actions[bot] Aug 10, 2023
2064f1b
Update query snapshots
github-actions[bot] Aug 10, 2023
b8af913
don't need it on kafka table
pauldambra Aug 10, 2023
906bbce
Update query snapshots
github-actions[bot] Aug 10, 2023
564ad68
Merge branch 'master' into feat/count-replay-events
pauldambra Sep 11, 2023
7c776b8
update desired columns
pauldambra Sep 11, 2023
66d5b9d
switch to counting events and messages
pauldambra Sep 11, 2023
6d89f3c
Update query snapshots
github-actions[bot] Sep 11, 2023
c6f0844
Merge branch 'master' into feat/count-replay-events
pauldambra Sep 12, 2023
e56a1ee
first pass addition of _timestamp
pauldambra Sep 12, 2023
dee3732
maybe like this
pauldambra Sep 12, 2023
4d7e4c2
like this?
pauldambra Sep 12, 2023
4a2223c
Update query snapshots
github-actions[bot] Sep 12, 2023
c558ba2
Merge branch 'master' into feat/count-replay-events
pauldambra Sep 13, 2023
50c5097
explicit message count
pauldambra Sep 13, 2023
72e5b7b
Update query snapshots
github-actions[bot] Sep 13, 2023
e9deb66
Update query snapshots
github-actions[bot] Sep 13, 2023
4ea881f
Update query snapshots
github-actions[bot] Sep 13, 2023
b8bfffe
Update query snapshots
github-actions[bot] Sep 13, 2023
d946483
Update UI snapshots for `chromium` (2)
github-actions[bot] Sep 13, 2023
3304e71
Update query snapshots
github-actions[bot] Sep 13, 2023
9990557
hogql db schema too
pauldambra Sep 13, 2023
0dc7fdf
Update query snapshots
github-actions[bot] Sep 13, 2023
ec31d81
Update UI snapshots for `chromium` (2)
github-actions[bot] Sep 13, 2023
7faefca
Merge branch 'master' into feat/count-replay-events
pauldambra Sep 13, 2023
129ae94
fix
pauldambra Sep 13, 2023
98ba0af
Merge branch 'master' into feat/count-replay-events
pauldambra Sep 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
first pass addition of _timestamp
  • Loading branch information
pauldambra committed Sep 12, 2023
commit e56a1eee0d101dbadaf4dddcef122d92a4a511eb
18 changes: 12 additions & 6 deletions posthog/clickhouse/test/__snapshots__/test_schema.ambr
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,8 @@
console_warn_count Int64,
console_error_count Int64,
size Int64,
event_count Int64
event_count Int64,
_timestamp DateTime
) ENGINE = Kafka('test.kafka.broker:9092', 'clickhouse_session_replay_events_test', 'group1', 'JSONEachRow')

'
Expand Down Expand Up @@ -924,7 +925,8 @@
console_warn_count Int64,
console_error_count Int64,
size Int64,
event_count Int64
event_count Int64,
_timestamp DateTime
) ENGINE = Kafka('kafka:9092', 'clickhouse_session_replay_events_test', 'group1', 'JSONEachRow')

'
Expand Down Expand Up @@ -1353,7 +1355,8 @@
-- this allows us to count the number of snapshot events received in a session
-- often very useful in incidents or debugging
-- because we batch events we expect message_count to be lower than event_count
event_count SimpleAggregateFunction(sum, Int64)
event_count SimpleAggregateFunction(sum, Int64),
_timestamp SimpleAggregateFunction(max, DateTime64(6, 'UTC'))
) ENGINE = Distributed('posthog', 'posthog_test', 'sharded_session_replay_events', sipHash64(distinct_id))

'
Expand Down Expand Up @@ -1389,7 +1392,8 @@
sum(size) as size,
-- we can count the number of kafka messages instead of sending it explicitly
count(*) as message_count,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clever... that's not always a good thing 🤣

Is it better to explicitly add message_count: 1 in the plugin server to aid the future traveller?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a bit more clear with message_count: 1

sum(event_count) as event_count
sum(event_count) as event_count,
max(_timestamp) DateTime
FROM posthog_test.kafka_session_replay_events
group by session_id, team_id

Expand Down Expand Up @@ -1627,7 +1631,8 @@
-- this allows us to count the number of snapshot events received in a session
-- often very useful in incidents or debugging
-- because we batch events we expect message_count to be lower than event_count
event_count SimpleAggregateFunction(sum, Int64)
event_count SimpleAggregateFunction(sum, Int64),
_timestamp SimpleAggregateFunction(max, DateTime64(6, 'UTC'))
) ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/77f1df52-4b43-11e9-910f-b8ca3a9b9f3e_{shard}/posthog.session_replay_events', '{replica}')

PARTITION BY toYYYYMM(min_first_timestamp)
Expand Down Expand Up @@ -2252,7 +2257,8 @@
-- this allows us to count the number of snapshot events received in a session
-- often very useful in incidents or debugging
-- because we batch events we expect message_count to be lower than event_count
event_count SimpleAggregateFunction(sum, Int64)
event_count SimpleAggregateFunction(sum, Int64),
_timestamp SimpleAggregateFunction(max, DateTime64(6, 'UTC'))
) ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/77f1df52-4b43-11e9-910f-b8ca3a9b9f3e_{shard}/posthog.session_replay_events', '{replica}')

PARTITION BY toYYYYMM(min_first_timestamp)
Expand Down
4 changes: 3 additions & 1 deletion posthog/models/session_replay_event/migrations_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@
ALTER_SESSION_REPLAY_ADD_EVENT_COUNT_COLUMN = """
ALTER TABLE {table_name} on CLUSTER '{cluster}'
ADD COLUMN IF NOT EXISTS message_count SimpleAggregateFunction(sum, Int64),
ADD COLUMN IF NOT EXISTS event_count SimpleAggregateFunction(sum, Int64)
ADD COLUMN IF NOT EXISTS event_count SimpleAggregateFunction(sum, Int64),
-- fly by addition so that we can track lag in the data the same way as for other tables
ADD COLUMN IF NOT EXISTS _timestamp SimpleAggregateFunction(max, DateTime64(6, 'UTC'))
"""

ADD_EVENT_COUNT_DISTRIBUTED_SESSION_REPLAY_EVENTS_TABLE_SQL = (
Expand Down
9 changes: 6 additions & 3 deletions posthog/models/session_replay_event/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
console_warn_count Int64,
console_error_count Int64,
size Int64,
event_count Int64
event_count Int64,
_timestamp DateTime
) ENGINE = {engine}
"""

Expand Down Expand Up @@ -61,7 +62,8 @@
-- this allows us to count the number of snapshot events received in a session
-- often very useful in incidents or debugging
-- because we batch events we expect message_count to be lower than event_count
event_count SimpleAggregateFunction(sum, Int64)
event_count SimpleAggregateFunction(sum, Int64),
_timestamp SimpleAggregateFunction(max, DateTime64(6, 'UTC'))
) ENGINE = {engine}
"""

Expand Down Expand Up @@ -128,7 +130,8 @@
sum(size) as size,
-- we can count the number of kafka messages instead of sending it explicitly
count(*) as message_count,
sum(event_count) as event_count
sum(event_count) as event_count,
max(_timestamp) DateTime
FROM {database}.kafka_session_replay_events
group by session_id, team_id
""".format(
Expand Down