-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2482 privacy requests approved in quick succession error when not running worker #2489
2482 privacy requests approved in quick succession error when not running worker #2489
Conversation
we still maintain a single engine and sessionmaker per celery task
move worker main into its own module to avoid dependency conflicts
Codecov ReportBase: 88.45% // Head: 88.45% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #2489 +/- ##
=======================================
Coverage 88.45% 88.45%
=======================================
Files 327 328 +1
Lines 15957 15959 +2
Branches 4431 4433 +2
=======================================
+ Hits 14115 14117 +2
Misses 1688 1688
Partials 154 154
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
running some tests locally, both using a worker or not, and requiring request approval or not, i feel reasonably confident that we haven't introduced significant regressions in terms of database connection resources, especially when looking at test results that @pattisdr had gathered in ethyca/fidesops#944 i tested locally by using the attached testing script, tweaking number of requests and approval concurrency. when running the tests, i had another terminal running here were my general findings:
QueuePool_limit_stack.txt |
Starting review... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a necessary change for customers not running workers: it doesn't seem even workable without this. As an aside, it would be useful for us to better understand what is actually happening when task_always_eager
is True. In my opinion, the preferred mode for running fides should be with workers. If customers are not doing that, we need to investigate if task_always_eager=True
is the best way to actually execute privacy requests without workers and if there are other side effects we're not aware of.
Background from last PR
This PR ethyca/fidesops#944 assumed we were running celery workers and it makes sense to me why this is starting to breakdown when we're not using workers.
The biggest benefit from the PR was sharing engines across a celery process. I did add some smaller changes that also seemed to improve session management alongside: like passing along the same session for execution logs and sharing sessions, but sharing a session wasn't a requirement so it makes sense to investigate that here.
Testing setup:
I used Adam's script (thanks for attaching!) to "Make 20 requests with concurrency 5 to approve privacy requests", comparing with a worker / without a worker, to sharing a sessionmaker (new behavior) versus sharing a session (previous behavior). I also increased the pool_size
and max_overflow
to 50 each to avoid hitting these limits early and be able to properly compare. I had two connectors set up to increase the duration of a privacy request. I monitored the number of connections open and verified privacy request statuses and execution logs.
Takeaways
- It would be useful to add a separate PR sooner rather than later to allow the engine
pool_size
andmax_overflow
to be configurable. - I don't know if
task_always_eager=True
should even be used in production. Sean says that this makes code run locally, synchronously, which was not the intent at all. Part of the reasons for having celery were to be having these queues that don't consume too many resources at once, and execute privacy requests at a reasonable rate. - Running a worker: Didn't see huge differences between sharing a session versus only sharing the sessionmaker. In my previous testing in the last PR, I had noticed slightly better session management when sharing a session, but wasn't seeing that in this particular test setup.
- No worker: Confirmed huge improvements when not sharing a session. With the pool_size and max_overflow increased, session management seemed similar (just looking at connections used, max idle connections, etc).. However, sharing a session was causing major errors in privacy requests being able to complete successfully. 97% of privacy requests were failing with Session-related errors, like
NoneType' object has no attribute 'dispatch'"
or"Instance '<PrivacyRequest at 0xffff7d23eb60>' is not persistent within this Session"
, orNoneType' object has no attribute 'twophase'
created #2507 for follow up |
Closes #2482
Closes #2486
Code Changes
Session
per celery task invocationEngine
andsessionmaker
are reused across invocations to keep db connection resources in checkfrom fides.api.ops.service.saas_request.override_implementations import *
tasks/__init__.py
once the above import was addedSteps to Confirm
for Privacy requests approved in quick succession error when not running worker #2482 fix, run without a worker (e.g.
nox -s 'fides_env(dev)'
) and approve multiple privacy requests in quick succession using multiple browser tabs on the admin UI.for saas overrides are not loaded into celery worker runtime #2486, run with a worker (e.g.
nox -s dev -- worker ui
) and switchcelery.task_always_eager = false
infides.toml
see individual issues for more detailed steps to repro
Pre-Merge Checklist
CHANGELOG.md
Description Of Changes
These are two distinct fixes that are coupled together here because they touch similar parts of the codebase and came up when testing each other.
Sill pending some local testing to ensure db connections are kept in check even with the changes to session management here