Skip to content

Integrate Co-reference Detection for Pronoun Substitution in PII Masking #34

@hanneshapke

Description

@hanneshapke

Integrate Co-reference Detection for Pronoun Substitution in PII Masking

Background

The Yaak proxy currently detects and masks PII entities (names, emails, etc.) to protect user privacy when sending requests to external LLMs. However, it doesn't handle pronouns that refer to these masked entities, leading to gender/reference mismatches in responses.

The Problem

When a male name like "Tom Miller" is replaced with a female name like "Sarah Smith", associated pronouns ("he", "him", "his") remain unchanged, causing grammatical inconsistencies and confusion:

Current Behavior (Broken):

User Input: "Hi, his name is Tom Miller. Write a short biography about him."
↓ PII Detection: FIRSTNAME: Tom, SURNAME: Miller
Masked Request: "Hi, his name is Sarah Smith. Write a short biography about him."
                     ^^^                                                  ^^^
LLM Response: "Sarah Smith is a software engineer. She is a co-founder..."
                                                    ^^^
Unmasked Response: "Tom Miller is a software engineer. She is a co-founder..."
                                                       ^^^
                                        ❌ Gender mismatch!

Ideal Behavior (Fixed):

User Input: "Hi, his name is Tom Miller. Write a short biography about him."
↓ PII Detection + Co-reference: Tom Miller (cluster 1) ← "his", "him" also in cluster 1
Masked Request: "Hi, her name is Sarah Smith. Write a short biography about her."
                     ^^^                                                   ^^^
LLM Response: "Sarah Smith is a software engineer. She is a co-founder..."
                                                    ^^^
Unmasked Response: "Tom Miller is a software engineer. He is a co-founder..."
                                                       ^^
                                        ✅ Consistent pronouns!

What is Co-reference Detection?

Co-reference detection identifies and links mentions of the same entity across a text:

  • "Tom Miller", "he", "him", "his" all refer to the same person
  • "Sarah Smith", "she", "her" all refer to the same person

The Yaak model already performs co-reference detection and outputs cluster IDs for each token. We just need to use this information to handle pronoun substitution.


Current Model Capabilities

The multi-task PII detection model outputs co-reference predictions:

# From eval_model_detailed.py:298-300
coref_logits = outputs["coref_logits"][0]  # [seq_len, num_coref_labels]
coref_predictions = torch.argmax(coref_logits, dim=-1)  # [seq_len]
coref_pred_ids = [p.item() for p in coref_predictions]  # Cluster IDs per token

Example output:

Text:    "Tom Miller went to his car. He drove home."
Tokens:  Tom Miller went to his car . He drove home .
Cluster: 1   1     0   0  1   0   0 1  0     0    0
         └───┘              └─┘       └─┘
         Same entity (cluster 1)

The model correctly identifies that "Tom Miller", "his", and "He" all refer to the same entity (cluster 1).


Implementation Plan

Phase 1: Add Pronoun Mapping Module

File: src/backend/pii/pronoun_mapper.go (new)

Create a pronoun mapping service that handles gender-aware pronoun substitution:

package pii

import (
    "strings"
)

// PronounGender represents grammatical gender for pronouns
type PronounGender int

const (
    GenderUnknown PronounGender = iota
    GenderMale
    GenderFemale
    GenderNeutral
)

// PronounMapper handles pronoun substitution based on gender
type PronounMapper struct {
    pronounMap map[string]map[PronounGender]string
}

// NewPronounMapper creates a new pronoun mapper
func NewPronounMapper() *PronounMapper {
    return &PronounMapper{
        pronounMap: initPronounMap(),
    }
}

// initPronounMap initializes the pronoun mapping table
func initPronounMap() map[string]map[PronounGender]string {
    return map[string]map[PronounGender]string{
        // Subject pronouns
        "he": {
            GenderMale:    "he",
            GenderFemale:  "she",
            GenderNeutral: "they",
        },
        "she": {
            GenderMale:    "he",
            GenderFemale:  "she",
            GenderNeutral: "they",
        },
        
        // Object pronouns
        "him": {
            GenderMale:    "him",
            GenderFemale:  "her",
            GenderNeutral: "them",
        },
        "her": {
            GenderMale:    "him",
            GenderFemale:  "her",
            GenderNeutral: "them",
        },
        
        // Possessive pronouns
        "his": {
            GenderMale:    "his",
            GenderFemale:  "her",
            GenderNeutral: "their",
        },
        
        // Reflexive pronouns
        "himself": {
            GenderMale:    "himself",
            GenderFemale:  "herself",
            GenderNeutral: "themselves",
        },
        "herself": {
            GenderMale:    "himself",
            GenderFemale:  "herself",
            GenderNeutral: "themselves",
        },
    }
}

// MapPronoun converts a pronoun from one gender to another
func (pm *PronounMapper) MapPronoun(pronoun string, fromGender, toGender PronounGender) string {
    lowerPronoun := strings.ToLower(pronoun)
    
    // Check if we have a mapping for this pronoun
    if genderMap, exists := pm.pronounMap[lowerPronoun]; exists {
        if mapped, ok := genderMap[toGender]; ok {
            // Preserve original capitalization
            if isCapitalized(pronoun) {
                return capitalize(mapped)
            }
            return mapped
        }
    }
    
    // If no mapping found, return original
    return pronoun
}

// DetectGenderFromName attempts to detect gender from a first name
func (pm *PronounMapper) DetectGenderFromName(name string) PronounGender {
    // Common male names
    maleNames := []string{"tom", "john", "james", "michael", "david", "robert"}
    // Common female names
    femaleNames := []string{"sarah", "emma", "lisa", "jennifer", "mary", "patricia"}
    
    lowerName := strings.ToLower(name)
    
    for _, male := range maleNames {
        if strings.Contains(lowerName, male) {
            return GenderMale
        }
    }
    
    for _, female := range femaleNames {
        if strings.Contains(lowerName, female) {
            return GenderFemale
        }
    }
    
    return GenderUnknown
}

// Helper functions
func isCapitalized(s string) bool {
    if len(s) == 0 {
        return false
    }
    return s[0] >= 'A' && s[0] <= 'Z'
}

func capitalize(s string) string {
    if len(s) == 0 {
        return s
    }
    return strings.ToUpper(string(s[0])) + s[1:]
}

Phase 2: Extend Detector Output with Co-reference Information

File: src/backend/pii/detectors/types.go

Add co-reference cluster information to detector output:

// Entity represents a detected PII entity
type Entity struct {
    Text      string
    Label     string
    StartPos  int
    EndPos    int
    ClusterID int    // NEW: Co-reference cluster ID (0 = no cluster)
}

// DetectorOutput represents the result of PII detection
type DetectorOutput struct {
    Entities          []Entity
    CorefClusters     map[int][]EntityMention  // NEW: Cluster ID → mentions
    InferenceTimeMs   float64
}

// EntityMention represents a single mention in a co-reference cluster
type EntityMention struct {
    Text     string
    StartPos int
    EndPos   int
    IsEntity bool    // true if this is a PII entity, false if pronoun
}

Phase 3: Update Model Detector to Extract Co-references

File: src/backend/pii/detectors/model_detector.go

Modify the model detector to extract co-reference information from model output:

// In the Detect method, after getting PII predictions:

// Extract co-reference clusters
corefClusters := make(map[int][]EntityMention)

for i, token := range tokens {
    clusterID := corefPredictions[i]
    
    if clusterID > 0 {  // Skip cluster 0 (no cluster)
        mention := EntityMention{
            Text:     token,
            StartPos: tokenOffsets[i].Start,
            EndPos:   tokenOffsets[i].End,
            IsEntity: isPIIEntity(tokens[i], piiPredictions[i]),
        }
        
        corefClusters[clusterID] = append(corefClusters[clusterID], mention)
    }
}

// Set cluster IDs on entities
for i := range entities {
    entities[i].ClusterID = findClusterForEntity(entities[i], corefClusters)
}

return DetectorOutput{
    Entities:      entities,
    CorefClusters: corefClusters,
}, nil

Phase 4: Update Masking Service with Pronoun Substitution

File: src/backend/pii/masking_service.go

Extend the masking service to handle pronoun substitution:

type MaskingService struct {
    detector       detectors.Detector
    generator      *GeneratorService
    pronounMapper  *PronounMapper  // NEW
}

func NewMaskingService(detector detectors.Detector, generator *GeneratorService) *MaskingService {
    return &MaskingService{
        detector:      detector,
        generator:     generator,
        pronounMapper: NewPronounMapper(),  // NEW
    }
}

func (s *MaskingService) MaskText(text string, logPrefix string) MaskedResult {
    piiFound, err := s.detector.Detect(context.Background(), detectors.DetectorInput{Text: text})
    // ... existing PII detection code ...
    
    // NEW: Handle pronoun substitution
    genderMappings := make(map[int]struct{
        OriginalGender PronounGender
        MaskedGender   PronounGender
    })
    
    // Determine gender change for each cluster
    for clusterID, mentions := range piiFound.CorefClusters {
        originalGender := s.detectClusterGender(mentions, entities)
        maskedGender := s.detectMaskedGender(mentions, entities, maskedToOriginal)
        
        genderMappings[clusterID] = struct{
            OriginalGender PronounGender
            MaskedGender   PronounGender
        }{
            OriginalGender: originalGender,
            MaskedGender:   maskedGender,
        }
    }
    
    // Replace pronouns in clusters that have gender changes
    for clusterID, genderMap := range genderMappings {
        if genderMap.OriginalGender != genderMap.MaskedGender {
            maskedText = s.replaceClusterPronouns(
                maskedText,
                piiFound.CorefClusters[clusterID],
                genderMap.OriginalGender,
                genderMap.MaskedGender,
            )
        }
    }
    
    return MaskedResult{
        MaskedText:       maskedText,
        MaskedToOriginal: maskedToOriginal,
        Entities:         entities,
        GenderMappings:   genderMappings,  // Store for restoration
    }
}

func (s *MaskingService) RestorePII(text string, result MaskedResult) string {
    // Restore PII entities
    restoredText := text
    for maskedText, originalText := range result.MaskedToOriginal {
        restoredText = strings.ReplaceAll(restoredText, maskedText, originalText)
    }
    
    // NEW: Reverse pronoun substitutions
    for clusterID, genderMap := range result.GenderMappings {
        if genderMap.OriginalGender != genderMap.MaskedGender {
            // Reverse: masked → original gender
            restoredText = s.reverseClusterPronouns(
                restoredText,
                clusterID,
                genderMap.MaskedGender,
                genderMap.OriginalGender,
            )
        }
    }
    
    return restoredText
}

Phase 5: Add Configuration and Testing

File: src/backend/config/config.go

Add configuration option to enable/disable pronoun substitution:

type Config struct {
    // ... existing fields ...
    EnablePronounSubstitution bool `json:"enable_pronoun_substitution"`
}

File: src/backend/pii/detectors/model_detector_test.go

Add comprehensive tests:

func TestCorefPronounSubstitution(t *testing.T) {
    tests := []struct {
        name           string
        input          string
        expectedMasked string
        expectedRestored string
    }{
        {
            name:  "male to female name change",
            input: "Tom Miller went to his car. He drove home.",
            expectedMasked: "Sarah Smith went to her car. She drove home.",
            expectedRestored: "Tom Miller went to his car. He drove home.",
        },
        {
            name:  "female to male name change",
            input: "Sarah went to her office. She worked late.",
            expectedMasked: "John went to his office. He worked late.",
            expectedRestored: "Sarah went to her office. She worked late.",
        },
        {
            name:  "reflexive pronouns",
            input: "John introduced himself to the team.",
            expectedMasked: "Mary introduced herself to the team.",
            expectedRestored: "John introduced himself to the team.",
        },
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Test implementation
        })
    }
}

Integration Points

1. Model Inference

The model already outputs co-reference predictions. We need to:

  • Extract coref_logits from model output (already done in Python)
  • Pass cluster information to Go backend via detector interface
  • Map tokens to cluster IDs

2. PII Masking

When masking PII:

  1. Detect PII entities and their cluster IDs
  2. Identify pronouns in the same cluster
  3. Determine gender change (male→female, female→male, etc.)
  4. Replace pronouns with appropriate forms

3. PII Restoration

When restoring original PII:

  1. Restore masked entities (existing functionality)
  2. Reverse pronoun substitutions using stored gender mappings
  3. Ensure pronouns match original text

Example Scenarios

Scenario 1: Biography Request

Input:     "His name is Tom Miller. Write about him."
Masked:    "Her name is Sarah Smith. Write about her."
LLM Out:   "Sarah Smith is an engineer. She graduated..."
Restored:  "Tom Miller is an engineer. He graduated..."

Scenario 2: Multiple Entities

Input:     "Tom met Sarah. He thanked her for the help."
Masked:    "Lisa met John. She thanked him for the help."
LLM Out:   "Lisa met John. She thanked him warmly."
Restored:  "Tom met Sarah. He thanked her warmly."

Scenario 3: Reflexive Pronouns

Input:     "Tom introduced himself to the CEO."
Masked:    "Emma introduced herself to the CEO."
LLM Out:   "Emma introduced herself professionally."
Restored:  "Tom introduced himself professionally."

Success Criteria

  • PronounMapper module created with gender mapping tables
  • Pronoun mapping supports:
    • Subject pronouns (he/she/they)
    • Object pronouns (him/her/them)
    • Possessive pronouns (his/her/their)
    • Reflexive pronouns (himself/herself/themselves)
  • Detector output includes co-reference cluster information
  • Model detector extracts and passes cluster IDs
  • Masking service handles pronoun substitution
  • Restoration service reverses pronoun changes
  • Configuration option to enable/disable feature
  • Comprehensive test coverage (10+ test cases)
  • Gender detection works for common names
  • Capitalization preserved in pronoun substitution
  • Integration with existing proxy flow
  • Documentation updated

Technical Challenges

1. Gender Detection

  • Challenge: Determining gender from masked names
  • Solution: Use name-based heuristics + fallback to neutral pronouns

2. Pronoun Ambiguity

  • Challenge: Words like "her" can be possessive or object
  • Solution: Context-aware mapping based on surrounding words

3. Multiple Entities

  • Challenge: Handling multiple entities with different genders
  • Solution: Track each cluster separately with independent gender mappings

4. Cross-sentence References

  • Challenge: Pronouns may refer to entities in previous sentences
  • Solution: Use co-reference clusters that span entire text

Future Enhancements

  1. Advanced Gender Detection: Use external name-gender databases
  2. Neutral Pronoun Support: Better handling of they/them pronouns
  3. Language Support: Extend to other languages beyond English
  4. LLM-based Detection: Use LLM to determine appropriate pronouns
  5. User Preferences: Allow users to specify gender preferences

References


Notes

This feature significantly improves the quality of PII-protected LLM interactions by maintaining grammatical consistency. The co-reference detection model is already trained and functional - we just need to leverage its output in the masking/restoration pipeline.

Complexity: Medium
Impact: High (better user experience, more natural responses)
Dependencies: Requires model co-reference output (already available)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions