Open
Description
If a full name contains shortened middle name or postfix like 'jr.' Anonymize doesn't consider it as a name to replace.
When running the following piece of code:
from llm_guard.util import configure_logger
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
from loguru import logger
configure_logger('CRITICAL')
# Create vault
vault = Vault()
# Init scanner
anonymize_scanner = Anonymize(vault)
name_list = {'Sam Jackson','Samuel Leroy Jackson','Samuel L. Jackson','Robert Downey jr.'}
for name in name_list:
anon_this = f"My name is {name}"
sanitized_text, is_valid, risk_score = anonymize_scanner.scan(anon_this)
if is_valid:
logger.success(f"Text is clean: {sanitized_text}")
else:
logger.error(f"Sanitized text: {sanitized_text} ({name})")
logger.info(f"Text is safe (risk estimation): {is_valid} ({risk_score})")
I get:
2024-10-30 13:46:19.002 | SUCCESS | __main__:<module>:20 - Text is clean: My name is Robert Downey jr.
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.147 | SUCCESS | __main__:<module>:20 - Text is clean: My name is Samuel L. Jackson
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.287 | ERROR | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_1] (Sam Jackson)
2024-10-30 13:46:19.287 | INFO | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.427 | ERROR | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_2] (Samuel Leroy Jackson)
2024-10-30 13:46:19.427 | INFO | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
While I would expect it to replace all names. Could you please take a look and let me know if I'm missing smth. Thank you in advance.
Metadata
Metadata
Assignees
Labels
No labels