-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(Replay): Add URLs Visited to EAP Replay Query #104538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #104538 +/- ##
===========================================
+ Coverage 80.51% 80.53% +0.01%
===========================================
Files 9352 9353 +1
Lines 400281 400874 +593
Branches 25700 25700
===========================================
+ Hits 322274 322829 +555
- Misses 77539 77577 +38
Partials 468 468 |
|
|
||
| # Sort by timestamp and extract URLs | ||
| rows = result.get("data", []) | ||
| rows.sort(key=lambda r: r.get("first_seen") or 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we sort in the query already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems tricky to sort by the "first_seen" alias I created since its a column we are selecting with the min function. When we try to:
ORDER BY first_seen
This transpiles to:
ORDER BY attributes_float_17['first_seen']
Which gives this error:
Column attributes_float_17 is not under aggregate function and not in GROUP BY keys error.
Seemed easier to sort afterwards, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution here is to sort on the alias of the aggregation instead. If you call your first_seen aggregation differently, you can then sort on first_seen_aggregation and not hit this issue I believe.
Otherwise, we'll have to change something on the EAP.
Currently when executing the EAP replay query, we see the following error:
This is because the frontend expects the urls visited to be passed in the data, but the EAP query previously did not query this data. Thus, this PR adds a new query to post process the URLs visited to the replay data. Specifically, the query groups by the URL column, selecting the URL itself along with the earliest time stamp. We then sort this afterwards to ensure correctness of ordering.
Unit tests are also added to assert correctness of URL retrieval, ordering, and deduping.