Skip to content

Conversation

@cliffordxing
Copy link
Contributor

Currently when executing the EAP replay query, we see the following error:

image

This is because the frontend expects the urls visited to be passed in the data, but the EAP query previously did not query this data. Thus, this PR adds a new query to post process the URLs visited to the replay data. Specifically, the query groups by the URL column, selecting the URL itself along with the earliest time stamp. We then sort this afterwards to ensure correctness of ordering.

Unit tests are also added to assert correctness of URL retrieval, ordering, and deduping.

@cliffordxing cliffordxing requested a review from a team as a code owner December 8, 2025 19:39
@cliffordxing cliffordxing marked this pull request as draft December 8, 2025 19:39
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 8, 2025
@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...y/replays/endpoints/organization_replay_details.py 83.33% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #104538      +/-   ##
===========================================
+ Coverage   80.51%    80.53%   +0.01%     
===========================================
  Files        9352      9353       +1     
  Lines      400281    400874     +593     
  Branches    25700     25700              
===========================================
+ Hits       322274    322829     +555     
- Misses      77539     77577      +38     
  Partials      468       468              

@cliffordxing cliffordxing marked this pull request as ready for review December 8, 2025 21:03
@cliffordxing cliffordxing requested a review from phacops December 8, 2025 22:40

# Sort by timestamp and extract URLs
rows = result.get("data", [])
rows.sort(key=lambda r: r.get("first_seen") or 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we sort in the query already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems tricky to sort by the "first_seen" alias I created since its a column we are selecting with the min function. When we try to:

ORDER BY first_seen

This transpiles to:

ORDER BY attributes_float_17['first_seen']

Which gives this error:

Column attributes_float_17 is not under aggregate function and not in GROUP BY keys error.

Seemed easier to sort afterwards, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution here is to sort on the alias of the aggregation instead. If you call your first_seen aggregation differently, you can then sort on first_seen_aggregation and not hit this issue I believe.

Otherwise, we'll have to change something on the EAP.

@cliffordxing cliffordxing merged commit 0ec76ce into master Dec 9, 2025
66 checks passed
@cliffordxing cliffordxing deleted the query-urls-eap branch December 9, 2025 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants