-
Notifications
You must be signed in to change notification settings - Fork 6
gc cron job to clean up dangling policy_bindings entries #3065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
michaeljguarino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely need to make sure there are indices on all these columns as well, the exists queries are going to be quite slow if not.
|
|
||
| # Create a project with write_bindings - these should be kept | ||
| project = insert(:project, write_bindings: [%{user_id: user.id}]) | ||
| %{write_bindings: [kept_binding]} = Console.Repo.preload(project, [:write_bindings]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually don't think this preload should be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's needed if we want a kept_binding that we can run assertions against after the cronjob
lib/console/deployments/cron.ex
Outdated
| |> PolicyBinding.ordered(asc: :id) | ||
| |> Repo.stream(method: :keyset) | ||
| |> Console.throttle(count: 100, pause: 1) | ||
| |> Task.async_stream(fn binding -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would want to chunk these in batches of 100 and delete each chunk in one query, to save db network round trips. Pretty sure there are some other examples of doing that here.
641a8d3 to
ed1ea75
Compare
ed1ea75 to
c7ee50e
Compare
michaeljguarino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably will work, but some refactoring changes, and need to try to use the ecto sql builder api directly since it protects against sql injections natively
| PolicyBinding.dangling() | ||
| |> PolicyBinding.ordered(asc: :id) | ||
| |> Repo.stream(method: :keyset) | ||
| |> Stream.chunk_every(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should throttle this (should be examples elsewhere)
lib/console/schema/policy_binding.ex
Outdated
|
|
||
| sql = """ | ||
| SELECT pb.id FROM policy_bindings pb | ||
| WHERE #{where_clause} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like this almost certainly has some sort of sql injection vulnerability, although its got to be pretty hard to construct. See if there's a way to sql escape that where clause, but also i'm pretty sure you can use the proper query builder with dynamics maybe for it.
lib/console/schema/policy_binding.ex
Outdated
| ORDER BY table_name, column_name | ||
| """ | ||
|
|
||
| Repo.query!(query, [], timeout: 30_000).rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to use query dsl here, but also don't have this query run every time you fetch dangling ids, probably refactor it so it can be called independently and have the table names passed to the dangling ids function each time (don't want to make 2*n queries instead of n)
c7ee50e to
dc777f6
Compare
lib/console/schema/policy_binding.ex
Outdated
| table_columns | ||
| |> Enum.flat_map(fn {table, columns} -> | ||
| columns | ||
| |> Enum.map(fn col -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a flat_map instead of piping to List.flatten
lib/console/schema/policy_binding.ex
Outdated
| @doc """ | ||
| Returns IDs of policy bindings whose policy_id is not in the given set of referenced IDs. | ||
| """ | ||
| def unreferenced_binding_ids(referenced_ids) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the referenced ids here should almost certainly be done with a subquery and join. If we move them into memory, it could be an unbounded set that either breaks sql (max query size) or breaks our server (OOM).
Look at ecto subquery to figure out how to make the join work (you should still fetch the table names into memory first and build the subquery with them)
dc777f6 to
5ee096a
Compare
gc cron job for dangling policy_bindings entries
Test Plan
unit test
Checklist
Plural Flow: console