Adding reporting email for studies over 200GB (SCP-5981) #2247
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BACKGROUND & CHANGES
This adds reporting infrastructure to identify studies that are no longer in compliance with our data retention policy. This policy was introduced last year as an update to our terms of service, and indicates that there is a "soft" cap of 200GB of storage, and private studies cannot be older than 1 year (among other updates). There is no enforcement of this policy as of yet, so this email report is a first step in that process. It is scheduled to run on a monthly basis and will email the dev email list the results. In practice, it is only going to flag studies that are over the 200GB limit, or ones that have so many files in the bucket that it becomes infeasible to compute a storage estimate (it will stop checking after 100K files).
The
data_retention_report
convenience method inSummaryStatsUtils
performs this check against all studies in a given instance using the default billing project. As stated, this method will run on a cron at 9AM on the first of every month, but it can also be invoked manually from the console at any time.MANUAL TESTING
app/lib/summary_stats_utils.rb#L10
):