-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(time_offset): improved LIMIT-handling in advanced analytics #27934
fix(time_offset): improved LIMIT-handling in advanced analytics #27934
Conversation
- Remove inner row_limit and row_offset when applying time_offset to prevent data inconsistency
59c830e
to
dce9d38
Compare
/testenv up |
@Antonio-RiveroMartnez Ephemeral environment spinning up at http://34.215.85.155:8080. Credentials are |
- Use config value as row_limit - Update test
/testenv up |
@Antonio-RiveroMartnez Ephemeral environment spinning up at http://54.244.23.72:8080. Credentials are |
@Antonio-RiveroMartnez are there any open issues we can link to this? I'm somewhat surprised that no one has reported an issue with this in the past. |
Hey @john-bodley I just linked it in the PR description, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes + tests LGTM. Let's follow up with a PR breaking this large method in the right places.
@@ -455,6 +455,13 @@ def processing_time_offsets( # pylint: disable=too-many-locals,too-many-stateme | |||
for metric in metric_names | |||
} | |||
|
|||
# When the original query has limit or offset we wont apply those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this method has gotten large! Thinking about refactors here and factoring out a processing_time_offset
(a method for each time offset) might help a little in reading this, but probably breaking down further would be good too.
Doing the refactor in this PR as part of this PR would make it harder to spot the diff, so I'll approve this one, but hoping we can refactor some of these into more bite-sized methods. Maybe a goal could be to remove the disable=too-many-locals,too-many-statements
above.
Ephemeral environment shutdown and build artifacts deleted. |
(cherry picked from commit 6844735)
(cherry picked from commit 6844735)
(cherry picked from commit 6844735)
(cherry picked from commit 6844735)
SUMMARY
When applying a time_offset in Advanced Analytics, if the row_limit is too small, data will not be present when the join occurs, so this PR prevent such behavior by not using limit nor offset in the inner query (the shifted one).
For example this query shown in the following video has all the records for a period of 10+ years:
Screen.Recording.2024-04-08.at.11.04.50.AM.mov
We can see that the result of shifted queries not always include the same periods than the original query, so when the join gets applied, it won't match some records, thus giving you incorrect output.
Original query with limit and sorting:
![Captura de pantalla 08 04 2024 a 11 05 13 AM](https://private-user-images.githubusercontent.com/38889534/320424968-bcf9b955-742b-4443-b502-5f0d807cd25f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDE5NzcsIm5iZiI6MTczOTMwMTY3NywicGF0aCI6Ii8zODg4OTUzNC8zMjA0MjQ5NjgtYmNmOWI5NTUtNzQyYi00NDQzLWI1MDItNWYwZDgwN2NkMjVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE5MjExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk0MDY3ODMxMDViMTM4OWUxZDhiZTExNzdkNjU3Y2FlZjNlZmE4ZTZlMDQzYjNiYjhlNjZmMjM2MzcwY2NlYjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.fGM-OE9HHbII0CJxtzRE5ipXgilFHrUAH0k6lw4vrd4)
1 year ago query with limit and sorting:
![Captura de pantalla 08 04 2024 a 11 05 50 AM](https://private-user-images.githubusercontent.com/38889534/320425184-549b8e45-c90b-4d36-bfc7-b96d8a3baf93.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDE5NzcsIm5iZiI6MTczOTMwMTY3NywicGF0aCI6Ii8zODg4OTUzNC8zMjA0MjUxODQtNTQ5YjhlNDUtYzkwYi00ZDM2LWJmYzctYjk2ZDhhM2JhZjkzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE5MjExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3MjUxZTllMDBmZjM4ZjcyNzAwOTZkNDVmOGY5ODA0MjljNzBlY2IyNzZjZThhZTU4NDcyODUzNjEzMmI0YzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Sq7T-XvkxB1GlNJK861Pi9wmOWEpI7PWBZO9xBcDuYg)
2 years ago query with limit and sorting:
![Captura de pantalla 08 04 2024 a 11 06 08 AM](https://private-user-images.githubusercontent.com/38889534/320425439-7d3ab934-6129-4719-a9f7-a7772621aa87.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDE5NzcsIm5iZiI6MTczOTMwMTY3NywicGF0aCI6Ii8zODg4OTUzNC8zMjA0MjU0MzktN2QzYWI5MzQtNjEyOS00NzE5LWE5ZjctYTc3NzI2MjFhYTg3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE5MjExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZhMGRlOWI2MGYwZDViMWM4YzM0MDY2OTRlMDkwNTdkYTIyYmVmMDQ1ZDM1NDMyZTM0OWIzNjljYWQxOTUyODAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1MuU5hPL_TBVZFvMNOwARl6o-Ix-_MdpqbSVO2ucf5s)
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Error:
![error](https://private-user-images.githubusercontent.com/38889534/320423586-7fe6bdb0-b527-4d40-b067-f79f56f68668.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDE5NzcsIm5iZiI6MTczOTMwMTY3NywicGF0aCI6Ii8zODg4OTUzNC8zMjA0MjM1ODYtN2ZlNmJkYjAtYjUyNy00ZDQwLWIwNjctZjc5ZjU2ZjY4NjY4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE5MjExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNmZjI2ZjI0MDZmYjNjNDJmNTE1ZGY1ZTA1NTY0NDc4ZTI2NDk5MzkyNmZkYTQzZTY0MTA1ZDNiZDNjZWI0ZTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.48hKpYZp5GD3zb448fTkMvdNEQY_A1ijHYWXe-_wJEE)
In the following image some datapoints are missing in the shifted columns because they are not part of the result dataframe after sorting and limiting it like the original query.
Fix:
![test](https://private-user-images.githubusercontent.com/38889534/320423600-4d018be1-ad8c-42b6-a4a0-85123d8f480e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDE5NzcsIm5iZiI6MTczOTMwMTY3NywicGF0aCI6Ii8zODg4OTUzNC8zMjA0MjM2MDAtNGQwMThiZTEtYWQ4Yy00MmI2LWE0YTAtODUxMjNkOGY0ODBlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDE5MjExN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQyZmQ2NjA1MjdkNjI4NTU4MWY2NGFkYzJiZDk2MmQ4ZWJiYjJhMzE1OWJhZTA5ZTRjYjNkMWMzYWZkZWRkZTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.tlmORjcAGU-HjpA7MKtF1-UPRtAz4rvjbEzHcQzzkJ0)
Now we can see that the query is able to find the records that belong to those offsets regardless, because it's joining using the complete data.
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION