Shortened middle name prevents anonymization

If a full name contains shortened middle name or postfix like 'jr.' Anonymize doesn't consider it as a name to replace.

When running the following piece of code:
```
from llm_guard.util import configure_logger
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
from loguru import logger

configure_logger('CRITICAL')

# Create vault
vault = Vault()

# Init scanner
anonymize_scanner = Anonymize(vault)

name_list = {'Sam Jackson','Samuel Leroy Jackson','Samuel L. Jackson','Robert Downey jr.'}

for name in name_list:
    anon_this = f"My name is {name}" 
    sanitized_text, is_valid, risk_score = anonymize_scanner.scan(anon_this)
    if is_valid:
        logger.success(f"Text is clean: {sanitized_text}")
    else:
        logger.error(f"Sanitized text: {sanitized_text} ({name})")
        logger.info(f"Text is safe (risk estimation): {is_valid} ({risk_score})")
```
I get:
```
2024-10-30 13:46:19.002 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Robert Downey jr.
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.147 | SUCCESS  | __main__:<module>:20 - Text is clean: My name is Samuel L. Jackson
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.287 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_1] (Sam Jackson)
2024-10-30 13:46:19.287 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
Entity CUSTOM doesn't have the corresponding recognizer in language : en
2024-10-30 13:46:19.427 | ERROR    | __main__:<module>:22 - Sanitized text: My name is [REDACTED_PERSON_2] (Samuel Leroy Jackson)
2024-10-30 13:46:19.427 | INFO     | __main__:<module>:23 - Text is safe (risk estimation): False (1.0)
```
While I would expect it to replace all names. Could you please take a look and let me know if I'm missing smth. Thank you in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortened middle name prevents anonymization #200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shortened middle name prevents anonymization #200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions