-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: add session dimensions and allow overview filters #32734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR optimizes web analytics pre-aggregation by consolidating tables and adding session dimensions for enhanced filtering capabilities.
- Consolidates four tables into two:
web_stats_daily
(event-based) andweb_bounces_daily
(session-oriented), removingweb_overview_daily
andweb_paths_daily
- Adds session dimensions (browser, OS, viewport, UTM params) to
web_bounces_daily
for granular filtering - Splits viewport field into width/height components in pre-aggregated tables
- Updates Dagster pipeline definitions to reflect table consolidation
- Implements pathname filtering workaround by mapping to
entry_pathname
in WebOverview queries
8 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
posthog/hogql_queries/web_analytics/stats_table_pre_aggregated.py
Outdated
Show resolved
Hide resolved
There were some conflicts with the Path Cleaning. I've fixed them, and it should be fine, but I will check the CI results to be sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🫡
This PR enables bounce-rate calculations filtering and filtering the
WebOverview
tiles.Problem
We don't have a reliable way to average over session properties. To make the overview filters work on the non-preaggregated version, we filter the events we care about before aggregating. This is not possible on pre-aggregated tables since we are, well, pre-aggregating them; therefore, we need to do it in a way that allows future filters to query arbitrary sections of the data.
This PR enables this by adding session dimensions (essentially the entry values of the session) to the
web_bounces_daily
table, which is the table used for displaying bounce rates.This allowed us only to need two tables at the end:
web_stats_daily
, which will have a more event-based breakdown.web_bounces_daily
, which has session-oriented dimensions and bounce rate calculations.The main difference between them is that
web_stats_daily
will have metrics forpathnames
events, whileweb_bounces_daily
doesn't, but has metrics likesession_duration
.P.S. The names are still temporary, but the queries to generate those others are heavy, so I am trying to squish as much as possible from the ones we can have while keeping a close eye that we're not duplicating session data on
web_bounces_daily
.Changes
web_paths_daily
as we can now use theweb_stats_daily
for thatweb_overview_daily
as we can now useweb_bounces_daily
for bounce and session-dimensions filtersWebOverview
queryDid you write or update any docs for this change?
How did you test this code?
Manually, basically if the pre-aggregated values matched the regular queries :)