Harmful command detection #1167

2024-08-31T13:23:51Z

tuxgitbot[bot]
bot Aug 31, 2024

to discuss proper automod regex for blocking rm rf jokes properly

let's move from the tux automod to discord native automod so messages are not sent at all

kzndotsh · 2024-08-31T13:24:01Z

kzndotsh
Aug 31, 2024
Maintainer

https://support.discord.com/hc/en-us/articles/10069840290711-Filter-Messages-Using-Regular-Expressions-Regex

0 replies

kzndotsh · 2024-08-31T13:24:51Z

kzndotsh
Aug 31, 2024
Maintainer

We use the Rust flavor of regex and recommend writing your regex in Rust syntax to minimize errors. To edit and test your regex syntax in Rust, we recommend using Rustexp. Case-insensitive and unicode support flags are on by default for every regex pattern.

0 replies

kzndotsh · 2024-08-31T13:25:21Z

kzndotsh
Aug 31, 2024
Maintainer

https://treeben77.github.io/automod-regex-generator/

0 replies

kzndotsh · 2024-08-31T13:26:18Z

kzndotsh
Aug 31, 2024
Maintainer

Automod resources

0 replies

nomnomshark41 · 2024-08-31T13:26:28Z

nomnomshark41
Aug 31, 2024

here is a prototype regex (?i)\brm\s*(-{1,2}(r|f|fr|rf|recursive|force)\s*){1,2}\b|\b(r\sm|m\sr)\s*(-{1,2}\w+\s*){1,2}\b

0 replies

kzndotsh · 2024-08-31T13:27:03Z

kzndotsh
Aug 31, 2024
Maintainer

tux/tux/utils/functions.py

Line 7 in 43ddda8

    
           harmful_command_pattern = r"(?:sudo\s+|doas\s+|run0\s+)?rm\s+(-[frR]*|--force|--recursive|--no-preserve-root|\s+)*([/\∕~]\s*|\*|/bin|/boot|/etc|/lib|/proc|/root|/sbin|/sys|/tmp|/usr|/var|/var/log|/network.|/system)(\s+--no-preserve-root|\s+\*)*|:\(\)\{ :|:& \};:"  # noqa: RUF001

0 replies

nomnomshark41 · 2024-08-31T13:27:17Z

nomnomshark41
Aug 31, 2024

what do you think about @kzndotsh

0 replies

nomnomshark41 · 2024-08-31T13:27:44Z

nomnomshark41
Aug 31, 2024

it need also

0 replies

kzndotsh · 2024-08-31T13:28:13Z

kzndotsh
Aug 31, 2024
Maintainer

what do you think about @kzndotsh

check if it works with the rustexp lib they use

0 replies

kzndotsh · 2024-08-31T13:33:27Z

kzndotsh
Aug 31, 2024
Maintainer

1. Variations and Misspellings:

People might try to bypass your filter by purposefully adding typos or slight variations. You should account for common misspellings and variations like:

sudo, Sudo, suDo, etc.
rm, rm, r-m, etc.
-rf, -r -f, -r--f, etc.

Example:

\s*s(u|U)(d|D)(o|O)\s*r\s*-{0,2}r(\s*|\s*-){0,2}f\s*

2. Contextual Use:

You should ensure that the context in which these terms are used is also considered, as you don't want to block legitimate discussions around these commands.

Example:

\b(s(u|U)(d|D)(o|O)\s+r\s+-r-{0,2}f)\b

3. Space and Separator Variations:

People might use multiple spaces or different types of separators (like underscores, dashes) to bypass simple regex.

Example:

\s*s(\\_?\s*u_?\s*d_?\s*o_?)\s*r(\\_?\s*)-?r(\\_?\s*)-?f\s*

4. Unicode Characters and Homoglyphs:

Users might use similar-looking characters from different character sets to bypass simple text matching.

Example:

\s*[sSｓＳ](u|U|ｕ|Ｕ)(d|D|ｄ|Ｄ)(o|O|ｏ|Ｏ)\s*r\s*-{0,2}r(\s*|\s*-){0,2}f\s*

5. Case Insensitivity:

Ensure your regex is case insensitive to capture all uppercase and lowercase variations.

(?i)\b(sudo\s+rm\s+-?r?-?f)\b

6. Word Boundaries:

Using word boundaries (\b) to avoid partial matches.

\b(?:sudo|su|s u d o)\s+rm\s+(?:-r|-r\s+|-r\s+-\s*f|\s+-r\s+\s*f)\b

Example Combined Pattern:

(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b

7. Testing:

Thoroughly test your regex with various test cases including different contexts, spacing variations, and common legitimate uses of similar phrases.

8. Use a Test Suite:

Create a suite of test messages and see how your regex performs on both blocking jokes and allowing legitimate content.

Example Test Cases:

sudo rm -rf /
S U D O r m -r -f
"Please avoid using sudo rm -rf in commands!"

0 replies

nomnomshark41 · 2024-08-31T13:37:07Z

nomnomshark41
Aug 31, 2024

sorry i'm going to research more 😭 @kzndotsh

0 replies

kzndotsh · 2024-08-31T13:37:28Z

kzndotsh
Aug 31, 2024
Maintainer

1. Define Your Test Cases

Create a list of test cases that covers all the scenarios you need to consider. Split them into two categories: Positive Cases (which should be blocked) and Negative Cases (which should be allowed).

Positive Cases (should match):

sudo rm -rf /
S U D O r m -r -f
sudo rm -rf
suDo rM - R F
sudo rm -r f

Negative Cases (should not match):

sudo rm -r file
Let's discuss sudo and rm commands
The command sudo rm -r file is dangerous
sudorrmf
sudo rm myfile

2. Choose a Testing Environment

You can use various platforms and languages to run your tests. Python is a good choice because of its simplicity and powerful regex library.

3. Write the Test Code

Here’s an example of how you could set up and run these tests in Python:

import re

# Your regex pattern
pattern = re.compile(r"(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b")

# Define test cases
test_cases = {
    "positive": [
        "sudo rm -rf /",
        "S U D O r m -r -f",
        "sudo   rm    -rf",
        "suDo rM - R F",
        "sudo rm -r f"
    ],
    "negative": [
        "sudo rm -r file",
        "Let's discuss sudo and rm commands",
        "The command sudo rm -r file is dangerous",
        "sudorrmf",
        "sudo    rm    myfile"
    ]
}

# Run tests
def run_tests():
    success = True
    print("Running Positive Test Cases\n--------------------------")
    for test_case in test_cases["positive"]:
        if pattern.search(test_case):
            print(f"PASSED: '{test_case}' was correctly flagged.")
        else:
            success = False
            print(f"FAILED: '{test_case}' was not flagged as expected.")
    
    print("\nRunning Negative Test Cases\n--------------------------")
    for test_case in test_cases["negative"]:
        if pattern.search(test_case):
            success = False
            print(f"FAILED: '{test_case}' was incorrectly flagged.")
        else:
            print(f"PASSED: '{test_case}' was correctly allowed.")
    
    if success:
        print("\nAll tests passed successfully!")
    else:
        print("\nSome tests failed. Please review the failed cases.")

# Execute tests
run_tests()

4. Automate the Tests

If you want to continually test as you update your regex, consider integrating these tests into a CI/CD pipeline using a tool like GitHub Actions, Travis CI, or Jenkins.

Example using pytest for more automated testing:

You can use the pytest framework to make it more structured and scalable:

import re
import pytest

# Your regex pattern
pattern = re.compile(r"(?i)\b(?:sudo|s u d o)\s+rm\s+(?:-r\s*-f|-rf|-r\s*-?\s*f)\b")

# Define test cases
positive_cases = [
    "sudo rm -rf /",
    "S U D O r m -r -f",
    "sudo   rm    -rf",
    "suDo rM - R F",
    "sudo rm -r f"
]

negative_cases = [
    "sudo rm -r file",
    "Let's discuss sudo and rm commands",
    "The command sudo rm -r file is dangerous",
    "sudorrmf",
    "sudo    rm    myfile"
]

@pytest.mark.parametrize("test_case", positive_cases)
def test_positive_cases(test_case):
    assert pattern.search(test_case), f"Positive test case failed: {test_case}"

@pytest.mark.parametrize("test_case", negative_cases)
def test_negative_cases(test_case):
    assert not pattern.search(test_case), f"Negative test case failed: {test_case}"

if __name__ == "__main__":
    pytest.main()

5. Review and Iterate

After running the tests, review any failed cases, adjust your regex pattern, and rerun the tests until you get the desired results.

0 replies

kzndotsh · 2024-08-31T13:37:54Z

kzndotsh
Aug 31, 2024
Maintainer

sorry i'm going to research more 😭 @kzndotsh

im just brain dumping stuff rn no worries

0 replies

kzndotsh · 2024-08-31T13:38:30Z

kzndotsh
Aug 31, 2024
Maintainer

0 replies

nomnomshark41 · 2024-08-31T13:39:12Z

nomnomshark41
Aug 31, 2024

we need ai to generate more test case @kzndotsh

0 replies

abxh · 2024-08-31T13:46:22Z

abxh
Aug 31, 2024

This seems like a herculean task with the range of unicode characters that exist. E.g. of a test case not covered: there exists unicode character to replace (long) spaces (⠀, U+2800).

>>> "rm\u2800-rf\u2800/42"
'rm⠀-rf⠀/42'

I think a pragmatic way of doing it is auto-modding the most seen and manually doing it for the rest if spotted.

0 replies

arutonee1 · 2024-08-31T13:49:11Z

arutonee1
Aug 31, 2024

You can of course replace any instance of / with [/...] where ... are variants of /, and similarly for \s. If you really needed to, you could also insert [...]* between every character, where ... are the zero-width unicode characters, but I think that'd be overkill.

I think its reasonable to only catch common cases as @abxh said and manually mod the rest.

0 replies

kzndotsh · 2024-08-31T13:54:28Z

kzndotsh
Aug 31, 2024
Maintainer

Now what scenarios could occur where it falsely flags something e.g. someone instructing someone for a support purpose?

0 replies

abxh · 2024-08-31T14:20:17Z

abxh
Aug 31, 2024

I think checking for a stricter criteria where spaced apart letters are not checked for could be considered.

The goal of automodding rm -rf is to prevent newbies from typing out the command accidentally (or directly copy-pasting the command). I believe that is important to consider in a problem like this where it's likely not all cases can be covered.

Viewed from the angle of a newbie, the most likely commands that would mislead a newbie would be:

Very slight variations in upper-lowercase. (this can be mitigated with .lower() or regexes that account for that)
Variations with unicode chars. I have found a python library that could account for this; Unidecode.
...

Even the above could be ignored. And just the literal commands (in ASCII) that poses a risk could be checked for.

If each letter is spaced apart, in my opinion, the command poses less of risk for newbies.

Note also:
The more complex the regex, the worse it is for performance and resource usage.

0 replies

nomnomshark41 · 2024-08-31T20:38:02Z

nomnomshark41
Aug 31, 2024

I think checking for a stricter criteria where spaced apart letters are not checked for could be considered.

The goal of automodding rm -rf is to prevent newbies from typing out the command accidentally (or directly copy-pasting the command). I believe that is important to consider in a problem like this where it's likely not all cases can be covered.

Viewed from the angle of a newbie, the most likely commands that would mislead a newbie would be:

Very slight variations in upper-lowercase. (this can be mitigated with .lower() or regexes that account for that)

Variations with unicode chars. I have found a python library that could account for this; Unidecode.

...

Even the above could be ignored. And just the literal commands (in ASCII) that poses a risk could be checked for.

If each letter is spaced apart, in my opinion, the command poses less of risk for newbies.

Note also: The more complex the regex, the worse it is for performance and resource usage.

the regex is to prevent trolls

0 replies

Jordanyay · 2024-09-01T03:10:28Z

Jordanyay
Sep 1, 2024

Now what scenarios could occur where it falsely flags something e.g. someone instructing someone for a support purpose?

There's bound to be a scenario where a support person may provide a rm -rf command, so the context of the command will be important to consider.

The most common targets of the rm trolls are usually the root "/" and home "~" directories, these may be utilized as a starting point for filtering out nonmalicious rm commands, but it isn't perfect, for example, a troll could easily bypass said filters by using command substitution:

sudo rm -rf "$(pwd)"

This Bash command invokes "pwd" to print the full filename of the current working directory, which in most cases would be a substitute for "~".

Another case would be the usage of the dot symbol "." to substitute the current directory:

sudo rm -rf .

0 replies

arutonee1 · 2024-09-01T11:42:12Z

arutonee1
Sep 1, 2024

I genuinely think just filtering out /, common /... dirs, and ~ only is reasonable, and letting everything fall back to manual modding. Any more than that will have too many false positives imo

I do really like @abxh's idea with unidecode or similar libraries. Don't see why .lower() has to be used though, since Python regex has /re/i.

Patterns that I think we should filter out automatically:

/
/bin/?
/boot/?
/dev/?
/etc/?
/home/?
/lib/?
/opt/?
/proc/?
/root/?
/run/?
/sbin/?
/sys/?
/usr/?
/var/?
~/?
/*/?

I can see /home/... being used occasionally in install support, and ., although rare, has popped up for me a few times, so I haven't included those. Any other cases to add, or weird edge cases warranting a removal from this list?

0 replies

electron271 · 2025-02-01T03:30:53Z

electron271
Feb 1, 2025
Maintainer

closing as this is seemingly finished
if it still needs discussion feel free to reopen

0 replies

Uh oh!

Harmful command detection #1167

Uh oh!

Uh oh!

tuxgitbot[bot] bot Aug 31, 2024

Replies: 23 comments

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Automod resources

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

1. Variations and Misspellings:

Example:

2. Contextual Use:

Example:

3. Space and Separator Variations:

Example:

4. Unicode Characters and Homoglyphs:

Example:

5. Case Insensitivity:

6. Word Boundaries:

Example Combined Pattern:

7. Testing:

8. Use a Test Suite:

Example Test Cases:

Uh oh!

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

1. Define Your Test Cases

Positive Cases (should match):

Negative Cases (should not match):

2. Choose a Testing Environment

3. Write the Test Code

4. Automate the Tests

Example using pytest for more automated testing:

5. Review and Iterate

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kzndotsh Aug 31, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

tuxgitbot[bot]
bot Aug 31, 2024

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer

kzndotsh
Aug 31, 2024
Maintainer