Open
Description
Describe the bug
When running BanSubstrings with case_sensitive=False and redact=True and scanning the prompt, the function will redact only the words that match the casing.
To Reproduce
prompt = "The user can perform arbitrary virus code execution by Virus injecting malicious code."
ban_substrings = BanSubstrings(substrings=["virus", "bug"], redact=True)
sanitized_prompt, results_valid, results_score = ban_substrings.scan(prompt)
Expected behavior
Actual: The user can perform arbitrary [REDACTED] code execution by Virus injecting malicious code.
Expected: The user can perform arbitrary [REDACTED] code execution by [REDACTED] injecting malicious code.
Possible solution
As str.replace is case sensitive, the issue might be solve by using regex's, e.g. like so:
def _redact_text(text: str, substrings: list[str]) -> str:
redacted_text = text
for s in substrings:
regex_redacted = re.compile(re.escape(s), re.IGNORECASE)
redacted_text = regex_redacted.sub("[REDACTED]", redacted_text)
return redacted_text
Metadata
Metadata
Assignees
Labels
No labels