Skip to content

Conversation

@kinjalh
Copy link
Member

@kinjalh kinjalh commented Jan 16, 2026

gc cron job for dangling policy_bindings entries

Test Plan

unit test

Checklist

  • If required, I have updated the Plural documentation accordingly.
  • I have added tests to cover my changes.
  • I have added a meaningful title and summary to convey the impact of this PR to a user.

Plural Flow: console

@kinjalh kinjalh added the enhancement New feature or request label Jan 16, 2026
Copy link
Member

@michaeljguarino michaeljguarino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we definitely need to make sure there are indices on all these columns as well, the exists queries are going to be quite slow if not.


# Create a project with write_bindings - these should be kept
project = insert(:project, write_bindings: [%{user_id: user.id}])
%{write_bindings: [kept_binding]} = Console.Repo.preload(project, [:write_bindings])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually don't think this preload should be needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's needed if we want a kept_binding that we can run assertions against after the cronjob

|> PolicyBinding.ordered(asc: :id)
|> Repo.stream(method: :keyset)
|> Console.throttle(count: 100, pause: 1)
|> Task.async_stream(fn binding ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would want to chunk these in batches of 100 and delete each chunk in one query, to save db network round trips. Pretty sure there are some other examples of doing that here.

@kinjalh kinjalh force-pushed the policy-bindings-gc-cron branch from 641a8d3 to ed1ea75 Compare January 22, 2026 15:37
@kinjalh kinjalh force-pushed the policy-bindings-gc-cron branch from ed1ea75 to c7ee50e Compare February 3, 2026 06:25
Copy link
Member

@michaeljguarino michaeljguarino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably will work, but some refactoring changes, and need to try to use the ecto sql builder api directly since it protects against sql injections natively

PolicyBinding.dangling()
|> PolicyBinding.ordered(asc: :id)
|> Repo.stream(method: :keyset)
|> Stream.chunk_every(100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should throttle this (should be examples elsewhere)


sql = """
SELECT pb.id FROM policy_bindings pb
WHERE #{where_clause}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like this almost certainly has some sort of sql injection vulnerability, although its got to be pretty hard to construct. See if there's a way to sql escape that where clause, but also i'm pretty sure you can use the proper query builder with dynamics maybe for it.

ORDER BY table_name, column_name
"""

Repo.query!(query, [], timeout: 30_000).rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use query dsl here, but also don't have this query run every time you fetch dangling ids, probably refactor it so it can be called independently and have the table names passed to the dangling ids function each time (don't want to make 2*n queries instead of n)

@kinjalh kinjalh force-pushed the policy-bindings-gc-cron branch from c7ee50e to dc777f6 Compare February 5, 2026 13:53
table_columns
|> Enum.flat_map(fn {table, columns} ->
columns
|> Enum.map(fn col ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a flat_map instead of piping to List.flatten

@doc """
Returns IDs of policy bindings whose policy_id is not in the given set of referenced IDs.
"""
def unreferenced_binding_ids(referenced_ids) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the referenced ids here should almost certainly be done with a subquery and join. If we move them into memory, it could be an unbounded set that either breaks sql (max query size) or breaks our server (OOM).

Look at ecto subquery to figure out how to make the join work (you should still fetch the table names into memory first and build the subquery with them)

@kinjalh kinjalh force-pushed the policy-bindings-gc-cron branch from dc777f6 to 5ee096a Compare February 10, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants