Harmful command detection #1167
Replies: 23 comments
-
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
here is a prototype regex (?i)\brm\s*(-{1,2}(r|f|fr|rf|recursive|force)\s*){1,2}\b|\b(r\sm|m\sr)\s*(-{1,2}\w+\s*){1,2}\b |
Beta Was this translation helpful? Give feedback.
-
|
Line 7 in 43ddda8 |
Beta Was this translation helpful? Give feedback.
-
|
what do you think about @kzndotsh |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
check if it works with the rustexp lib they use |
Beta Was this translation helpful? Give feedback.
-
1. Variations and Misspellings:People might try to bypass your filter by purposefully adding typos or slight variations. You should account for common misspellings and variations like:
Example:\s*s(u|U)(d|D)(o|O)\s*r\s*-{0,2}r(\s*|\s*-){0,2}f\s*2. Contextual Use:You should ensure that the context in which these terms are used is also considered, as you don't want to block legitimate discussions around these commands. Example:\b(s(u|U)(d|D)(o|O)\s+r\s+-r-{0,2}f)\b3. Space and Separator Variations:People might use multiple spaces or different types of separators (like underscores, dashes) to bypass simple regex. Example:\s*s(\\_?\s*u_?\s*d_?\s*o_?)\s*r(\\_?\s*)-?r(\\_?\s*)-?f\s*4. Unicode Characters and Homoglyphs:Users might use similar-looking characters from different character sets to bypass simple text matching. Example:\s*[sSsS](u|U|u|U)(d|D|d|D)(o|O|o|O)\s*r\s*-{0,2}r(\s*|\s*-){0,2}f\s*5. Case Insensitivity:Ensure your regex is case insensitive to capture all uppercase and lowercase variations. (?i)\b(sudo\s+rm\s+-?r?-?f)\b6. Word Boundaries:Using word boundaries ( \b(?:sudo|su|s u d o)\s+rm\s+(?:-r|-r\s+|-r\s+-\s*f|\s+-r\s+\s*f)\bExample Combined Pattern:(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b7. Testing:Thoroughly test your regex with various test cases including different contexts, spacing variations, and common legitimate uses of similar phrases. 8. Use a Test Suite:Create a suite of test messages and see how your regex performs on both blocking jokes and allowing legitimate content. Example Test Cases:
|
Beta Was this translation helpful? Give feedback.
-
|
sorry i'm going to research more 😭 @kzndotsh |
Beta Was this translation helpful? Give feedback.
-
1. Define Your Test CasesCreate a list of test cases that covers all the scenarios you need to consider. Split them into two categories: Positive Cases (which should be blocked) and Negative Cases (which should be allowed). Positive Cases (should match):
Negative Cases (should not match):
2. Choose a Testing EnvironmentYou can use various platforms and languages to run your tests. Python is a good choice because of its simplicity and powerful regex library. 3. Write the Test CodeHere’s an example of how you could set up and run these tests in Python: import re
# Your regex pattern
pattern = re.compile(r"(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b")
# Define test cases
test_cases = {
"positive": [
"sudo rm -rf /",
"S U D O r m -r -f",
"sudo rm -rf",
"suDo rM - R F",
"sudo rm -r f"
],
"negative": [
"sudo rm -r file",
"Let's discuss sudo and rm commands",
"The command sudo rm -r file is dangerous",
"sudorrmf",
"sudo rm myfile"
]
}
# Run tests
def run_tests():
success = True
print("Running Positive Test Cases\n--------------------------")
for test_case in test_cases["positive"]:
if pattern.search(test_case):
print(f"PASSED: '{test_case}' was correctly flagged.")
else:
success = False
print(f"FAILED: '{test_case}' was not flagged as expected.")
print("\nRunning Negative Test Cases\n--------------------------")
for test_case in test_cases["negative"]:
if pattern.search(test_case):
success = False
print(f"FAILED: '{test_case}' was incorrectly flagged.")
else:
print(f"PASSED: '{test_case}' was correctly allowed.")
if success:
print("\nAll tests passed successfully!")
else:
print("\nSome tests failed. Please review the failed cases.")
# Execute tests
run_tests()4. Automate the TestsIf you want to continually test as you update your regex, consider integrating these tests into a CI/CD pipeline using a tool like GitHub Actions, Travis CI, or Jenkins. Example using pytest for more automated testing:You can use the import re
import pytest
# Your regex pattern
pattern = re.compile(r"(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b")
# Define test cases
positive_cases = [
"sudo rm -rf /",
"S U D O r m -r -f",
"sudo rm -rf",
"suDo rM - R F",
"sudo rm -r f"
]
negative_cases = [
"sudo rm -r file",
"Let's discuss sudo and rm commands",
"The command sudo rm -r file is dangerous",
"sudorrmf",
"sudo rm myfile"
]
@pytest.mark.parametrize("test_case", positive_cases)
def test_positive_cases(test_case):
assert pattern.search(test_case), f"Positive test case failed: {test_case}"
@pytest.mark.parametrize("test_case", negative_cases)
def test_negative_cases(test_case):
assert not pattern.search(test_case), f"Negative test case failed: {test_case}"
if __name__ == "__main__":
pytest.main()5. Review and IterateAfter running the tests, review any failed cases, adjust your regex pattern, and rerun the tests until you get the desired results. |
Beta Was this translation helpful? Give feedback.
-
im just brain dumping stuff rn no worries |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
we need ai to generate more test case @kzndotsh |
Beta Was this translation helpful? Give feedback.
-
|
This seems like a herculean task with the range of unicode characters that exist. E.g. of a test case not covered: there exists unicode character to replace (long) spaces (⠀, U+2800). >>> "rm\u2800-rf\u2800/42"
'rm⠀-rf⠀/42'I think a pragmatic way of doing it is auto-modding the most seen and manually doing it for the rest if spotted. |
Beta Was this translation helpful? Give feedback.
-
|
My first instinct is a You can of course replace any instance of I think its reasonable to only catch common cases as @abxh said and manually mod the rest. |
Beta Was this translation helpful? Give feedback.
-
|
Now what scenarios could occur where it falsely flags something e.g. someone instructing someone for a support purpose? |
Beta Was this translation helpful? Give feedback.
-
|
I think checking for a stricter criteria where spaced apart letters are not checked for could be considered. The goal of automodding Viewed from the angle of a newbie, the most likely commands that would mislead a newbie would be:
Even the above could be ignored. And just the literal commands (in ASCII) that poses a risk could be checked for. If each letter is spaced apart, in my opinion, the command poses less of risk for newbies. Note also: |
Beta Was this translation helpful? Give feedback.
-
the regex is to prevent trolls |
Beta Was this translation helpful? Give feedback.
-
There's bound to be a scenario where a support person may provide a The most common targets of the sudo rm -rf "$(pwd)"This Bash command invokes " Another case would be the usage of the dot symbol " sudo rm -rf . |
Beta Was this translation helpful? Give feedback.
-
|
I genuinely think just filtering out I do really like @abxh's idea with unidecode or similar libraries. Don't see why .lower() has to be used though, since Python regex has Patterns that I think we should filter out automatically: I can see |
Beta Was this translation helpful? Give feedback.
-
|
closing as this is seemingly finished |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
to discuss proper automod regex for blocking rm rf jokes properly
let's move from the tux automod to discord native automod so messages are not sent at all
Beta Was this translation helpful? Give feedback.
All reactions