Skip to content

Comments

Fix crash unloading module referenced by ACL rules#2923

Closed
sachinvmurthy wants to merge 3 commits intovalkey-io:unstablefrom
sachinvmurthy:bug/fix_issue_2797
Closed

Fix crash unloading module referenced by ACL rules#2923
sachinvmurthy wants to merge 3 commits intovalkey-io:unstablefrom
sachinvmurthy:bug/fix_issue_2797

Conversation

@sachinvmurthy
Copy link
Contributor

@sachinvmurthy sachinvmurthy commented Dec 9, 2025

Problem:

Unloading a module crashes if any ACL user has rules referencing that module’s commands or subcommands (e.g., +hello.simple). The crash occurs in ACLRecomputeCommandBitsFromCommandRulesAllUsers while applying stale rules; previously it asserted on failure.

Root Cause:
Stale ACL command rules persist in users’ selectors after module unload. Recompute attempts to reapply them, resulting in invalid lookups or assertions.

Fixes #2797

Add ACLCleanupStaleCommandRulesAllUsers to proactively remove non-existent module commands/subcommands from selector->command_rules.
Make ACLRecomputeCommandBitsFromCommandRulesAllUsers skip invalid rules with a warning instead of asserting.

@sachinvmurthy
Copy link
Contributor Author

cc: @roshkhatri @dmitrypol @dvkashapov

Thanks for reviewing the earlier PR. Created a new PR against unstable as addressed in the old PR.

@zuiderkwast zuiderkwast changed the title fix for bug reported in issue #2797 Fix crash unloading module referenced by ACL rules Dec 10, 2025
src/acl.c Outdated
const char *sub = pipe + 1;
if (cmd->subcommands_ht) {
sds sub_sds = sdsnew(sub);
struct serverCommand *sc = (struct serverCommand *)dictFetchValue(cmd->subcommands_ht, sub_sds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command table uses hashtable instead of dict in 8.1 and later. That seems to be why the build is failing.

We probably need this code with dict when we backport the fix to 8.0 and 7.2 though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or would it work to use lookupCommandBySdsLogic here?

UNTESTED:

Suggested change
struct serverCommand *sc = (struct serverCommand *)dictFetchValue(cmd->subcommands_ht, sub_sds);
struct serverCommand *sc = lookupCommandBySdsLogic(cmd->subcommands_ht, sub_sds);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay! thank you. will try and see if that helps

/* Get a command from the original command table, that is not affected
* by the command renaming operations: we base all the ACL work from that
* table, so that ACLs are valid regardless of command renaming. */
static struct serverCommand *ACLLookupCommand(const char *name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the new function below this existing function, there is no need to move the existing function ACLLookupCommand within the file. We can minimize the diff, the git history footprint.

Copy link
Member

@rjd15372 rjd15372 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm approving the PR but make sure to address @zuiderkwast comments.

Copy link
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zuiderkwast 's review. Overall looks good.

src/acl.c Outdated
Comment on lines 747 to 750
if (first != '+' && first != '-') {
/* Unknown formatting, keep as-is */
new_rules = sdscatfmt(new_rules, "%S ", tok);
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the unknown formatting are we looking at here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wanted to cover all edge cases.

Signed-off-by: sachinvmurthy <sachin.murthy97@gmail.com>
Signed-off-by: sachinvmurthy <sachin.murthy97@gmail.com>
Signed-off-by: sachinvmurthy <sachin.murthy97@gmail.com>
Copy link
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, just minor questions and feedback

aclSelector *selector = (aclSelector *) listNodeValue(ln);
int argc = 0;
sds *argv = sdssplitargs(selector->command_rules, &argc);
if (!argv) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we atleast add a warning here while continuing, if sdssplitargs() fails for any reason?

} else {
sdsfree(selector->command_rules);
selector->command_rules = sdsempty();
sdsfree(new_rules);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we make sure the new rules are parsable? is there a small check we can add?

@roshkhatri
Copy link
Member

After discussion in the meeting today, it was decided to not edit the command rules and reject unloading of the module until ACLs are cleaned by the user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CRASH] Valkey7/8/9 crashes when unloading a module if a command from that module was added to an ACL user.

4 participants