Skip to content

Shortened middle name prevents anonymization #200

Open
@akaliuta

Description

@akaliuta

If a full name contains shortened middle name or postfix like 'jr.' Anonymize doesn't consider it as a name to replace.

When running the following piece of code:

from llm_guard.util import configure_logger
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
from loguru import logger

configure_logger('CRITICAL')

# Create vault
vault = Vault()

# Init scanner
anonymize_scanner = Anonymize(vault)

name_list = {'Sam Jackson','Samuel Leroy Jackson','Samuel L. Jackson','Robert Downey jr.'}

for name in name_list:
    anon_this = f"My name is {name}" 
    sanitized_text, is_valid, risk_score = anonymize_scanner.scan(anon_this)
    if is_valid:
        logger.success(f"Text is clean: {sanitized_text}")
    else:
        logger.error(f"Sanitized text: {sanitized_text} ({name})")
        logger.info(f"Text is safe (risk estimation): {is_valid} ({risk_score})")

I get:

2024-10-30 13:46:19.002 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Robert Downey jr.
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.147 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Samuel L. Jackson
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.287 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_1] (Sam Jackson)
2024-10-30 13:46:19.287 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.427 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_2] (Samuel Leroy Jackson)
2024-10-30 13:46:19.427 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)

While I would expect it to replace all names. Could you please take a look and let me know if I'm missing smth. Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions