Skip to content

Adding reporting email for studies over 200GB (SCP-5981) #2247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2025

Conversation

bistline
Copy link
Contributor

BACKGROUND & CHANGES

This adds reporting infrastructure to identify studies that are no longer in compliance with our data retention policy. This policy was introduced last year as an update to our terms of service, and indicates that there is a "soft" cap of 200GB of storage, and private studies cannot be older than 1 year (among other updates). There is no enforcement of this policy as of yet, so this email report is a first step in that process. It is scheduled to run on a monthly basis and will email the dev email list the results. In practice, it is only going to flag studies that are over the 200GB limit, or ones that have so many files in the bucket that it becomes infeasible to compute a storage estimate (it will stop checking after 100K files).

The data_retention_report convenience method in SummaryStatsUtils performs this check against all studies in a given instance using the default billing project. As stated, this method will run on a cron at 9AM on the first of every month, but it can also be invoked manually from the console at any time.

MANUAL TESTING

  1. Running this method locally will almost assuredly not flag any studies, so in order to see results, lower the storage threshold from 200GB to something like 20 MB (in app/lib/summary_stats_utils.rb#L10):
DATA_STORAGE_CAP = 20.megabytes
  1. Run the email job with the following command, and confirm you see something similar to this (the results obviously will depend on the studies in your local instance). Any private studies older than 1 year should have a red box for "age violation":
SingleCellMailer.data_retention_policy_report.deliver_now

Screenshot 2025-04-30 at 11 51 58 AM

@bistline bistline requested a review from eweitz April 30, 2025 16:02
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

Attention: Patch coverage is 82.94118% with 29 lines in your changes missing coverage. Please review.

Project coverage is 71.16%. Comparing base (d74fe21) to head (a57a486).
Report is 80 commits behind head on development.

Files with missing lines Patch % Lines
app/lib/summary_stats_utils.rb 86.95% 21 Missing ⚠️
app/mailers/single_cell_mailer.rb 11.11% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #2247      +/-   ##
===============================================
+ Coverage        70.82%   71.16%   +0.33%     
===============================================
  Files              331      335       +4     
  Lines            28311    28734     +423     
  Branches          2431     2520      +89     
===============================================
+ Hits             20052    20448     +396     
- Misses            8116     8144      +28     
+ Partials           143      142       -1     
Files with missing lines Coverage Δ
app/mailers/single_cell_mailer.rb 29.53% <11.11%> (-0.91%) ⬇️
app/lib/summary_stats_utils.rb 87.27% <86.95%> (-0.23%) ⬇️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bistline bistline added the build failure: false positive Build error confirmed as false positive. E.g. upstream service has a problem. label Apr 30, 2025
Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Nice observability improvement for cloud spend!

@bistline bistline merged commit 880f9ae into development May 7, 2025
5 of 6 checks passed
@github-actions github-actions bot deleted the jb-data-storage-tos branch May 7, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build failure: false positive Build error confirmed as false positive. E.g. upstream service has a problem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants