-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support T-1 roll-ups on package download statistics to reduce storage requirements #3423
Comments
@xavierdecoster - This bug can be used to track your work. |
For this t1-rollup to be efficient and cost-effective, we need to reduce the number of sql indices on the facts tables, as they are a source of bad performance during roll-ups. Will create separate issues to track suggestions, as we still retain lots of data we don't actually need in any functionality today. |
Status updateDatabase size at start of investigation: 621.7 GB Size reductions by:
Total estimated database size after performing all recommendations = 314.48 GB (or a 25.59 % reduction vs. the re-indexed original database!) Separate improvement suggestions:
|
We decided to try a different solution. Will reopen as needed. |
Problem
Currently, we store 6 weeks' worth of raw data in the statistics database. The smallest unit of time that we report is a week. Every time we generate a report we process most (~98%) of the raw data again. Storing this much raw data is costly, and reprocessing the raw data repeatedly is unnecessary.
Solution
Roll up the previous days' worth of data with dimensions. Instead of having 42 days' worth of raw data, we should have 41 days of daily rollups and 1 day's worth of raw data. Capacity and DTU usage should both drop significantly.
Notes
The text was updated successfully, but these errors were encountered: