-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Feature Description
Automatically detect and mask Personally Identifiable Information (PII) in log entries before storage, ensuring GDPR compliance and reducing data breach risk. This feature provides configurable patterns for common PII types (emails, credit cards, phone numbers, IPs) with multiple masking strategies (mask, hash, redact).
Problem/Use Case
Current problem:
- Developers accidentally log sensitive data (emails, credit cards, passwords)
- Once logged, PII is stored permanently and visible to anyone with log access
- GDPR violations can result in massive fines (up to 4% of annual revenue)
- Security audits flag PII in logs as high-risk
- Manual PII removal is time-consuming and error-prone
- Compliance teams demand "no PII in logs" policies
Real-world scenarios:
Scenario 1: Accidental credit card logging
// Developer logs entire request for debugging
logger.info('Payment request:', JSON.stringify(req.body));
// → Logs credit card number, CVV, everything
Without PII masking:
{"card_number": "4532-1234-5678-9010", "cvv": "123"}
With PII masking:
{"card_number": "****-****-****-9010", "cvv": "***"}Scenario 2: Email addresses in errors
Error: Invalid email format for user@example.com
→ Email visible to all log viewers
With PII masking:
Error: Invalid email format for u***@example.com
Scenario 3: GDPR "right to be forgotten"
User requests data deletion
→ Must scrub their email from ALL logs (nightmare!)
With PII masking:
→ Email was never stored, already masked
Proposed Solution
Implement configurable PII detection and masking at log ingestion:
Phase 1: Common PII patterns
pii_masking:
enabled: true
patterns:
- type: email
regex: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
action: mask # user@example.com → u***@e***.com
- type: credit_card
regex: "\\b(?:\\d{4}[- ]?){3}\\d{4}\\b"
action: redact # 4532-1234-5678-9010 → ****-****-****-9010
luhn_check: true # Only match valid card numbers
- type: phone_number
regex: "\\b(?:\\+?1[-.]?)?\\(?([0-9]{3})\\)?[-.]?([0-9]{3})[-.]?([0-9]{4})\\b"
action: mask # +1-555-123-4567 → +1-555-***-****
- type: ssn
regex: "\\b(?!000|666|9\\d{2})([0-8]\\d{2}|7([0-6]\\d))[-]?(?!00)\\d{2}[-]?(?!0000)\\d{4}\\b"
action: redact # 123-45-6789 → ***-**-****
- type: ip_address
regex: "\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b"
action: hash # 192.168.1.100 → ip_a3f5b8c9
enabled: false # Optional, disabled by default
- type: api_key
regex: "\\b[A-Za-z0-9_-]{32,}\\b"
context: "(?:api[_-]?key|token|secret|password)" # Only match near these words
action: redact # sk_live_abc123... → [REDACTED_API_KEY]Masking strategies:
-
mask: Partial masking, keep some characters for debugging
user@example.com → u***@e***.com 4532-1234-5678-9010 → ****-****-****-9010 -
redact: Complete removal
user@example.com → [REDACTED_EMAIL] 123-45-6789 → [REDACTED_SSN] -
hash: One-way hash (can correlate, can't reverse)
user@example.com → email_a3f5b8c9d1e2f3a4 192.168.1.100 → ip_b4c6d8e0f2a1b3c5
Phase 2: Structured field masking
// For structured logs (JSON)
{
"user": {
"email": "user@example.com", // ← Masked
"id": "user_123",
"ip": "192.168.1.100" // ← Optionally masked
},
"payment": {
"card_number": "4532-1234-5678-9010", // ← Masked
"amount": 49.99
}
}
// After masking:
{
"user": {
"email": "[REDACTED_EMAIL]",
"id": "user_123",
"ip": "192.168.1.100"
},
"payment": {
"card_number": "****-****-****-9010",
"amount": 49.99
}
}Phase 3: Custom patterns
# User-defined patterns for company-specific PII
custom_patterns:
- name: "Internal Employee ID"
regex: "EMP-[0-9]{6}"
action: hash
- name: "Customer Reference"
regex: "CUS-[A-Z0-9]{8}"
action: maskAlternatives Considered
-
Client-side masking (in SDKs)
- ✗ Can't enforce (devs forget to use it)
- ✗ Doesn't help with syslog, OTLP, or raw logs
- ✓ Could complement server-side masking
-
Post-processing masking (after storage)
- ✗ PII already stored (compliance violation)
- ✗ Can't fully delete (backups, replicas)
- ✗ Doesn't prevent data breaches
-
No masking, rely on access controls
- ✗ Doesn't solve accidental logging
- ✗ Doesn't help with data breaches
- ✗ Doesn't satisfy GDPR requirements
-
Manual review before logging
- ✗ Impossible at scale
- ✗ Human error inevitable
- ✗ Slows down development
Chosen approach: Automatic detection and masking at ingestion (before storage)
Implementation Details (Optional)
Technical implementation:
1. Ingestion pipeline integration
// Add PII masking middleware to ingestion pipeline
async function ingestLog(entry: LogEntry): Promise<void> {
// 1. Parse log entry
const parsed = parseLogEntry(entry);
// 2. Apply PII masking
const masked = await maskPII(parsed);
// 3. Store masked log
await storageEngine.insert(masked);
}2. PII detection engine
interface PIIPattern {
type: string;
regex: RegExp;
action: 'mask' | 'redact' | 'hash';
luhnCheck?: boolean;
contextRegex?: RegExp;
}
class PIIMasker {
private patterns: PIIPattern[];
constructor(config: PIIMaskingConfig) {
this.patterns = this.compilePatterns(config.patterns);
}
maskMessage(message: string): string {
let masked = message;
for (const pattern of this.patterns) {
// Apply context filter if specified
if (pattern.contextRegex && !pattern.contextRegex.test(masked)) {
continue;
}
masked = masked.replace(pattern.regex, (match) => {
// Luhn check for credit cards
if (pattern.luhnCheck && !this.passesLuhnCheck(match)) {
return match; // Not a valid card number, skip
}
return this.applyMasking(match, pattern);
});
}
return masked;
}
private applyMasking(value: string, pattern: PIIPattern): string {
switch (pattern.action) {
case 'mask':
return this.partialMask(value, pattern.type);
case 'redact':
return `[REDACTED_${pattern.type.toUpperCase()}]`;
case 'hash':
return `${pattern.type}_${this.hash(value)}`;
}
}
private partialMask(value: string, type: string): string {
if (type === 'email') {
const [local, domain] = value.split('@');
const maskedLocal = local[0] + '***';
const maskedDomain = domain.split('.').map(part => part[0] + '***').join('.');
return `${maskedLocal}@${maskedDomain}`;
}
if (type === 'credit_card') {
// Show only last 4 digits
return value.replace(/.(?=.{4})/g, '*');
}
// Generic masking
return value.replace(/.(?=.{3})/g, '*');
}
private hash(value: string): string {
return crypto.createHash('sha256')
.update(value + process.env.PII_HASH_SALT)
.digest('hex')
.substring(0, 16);
}
private passesLuhnCheck(cardNumber: string): boolean {
const digits = cardNumber.replace(/\D/g, '');
// Implement Luhn algorithm
// ...
return true; // simplified
}
}3. Structured data handling
function maskStructuredLog(data: any, patterns: PIIPattern[]): any {
if (typeof data === 'string') {
return maskMessage(data);
}
if (Array.isArray(data)) {
return data.map(item => maskStructuredLog(item, patterns));
}
if (typeof data === 'object' && data !== null) {
const masked: any = {};
for (const [key, value] of Object.entries(data)) {
// Check if field name suggests PII
const isPIIField = /email|password|ssn|card|phone|secret|token/i.test(key);
if (isPIIField && typeof value === 'string') {
masked[key] = maskMessage(value);
} else {
masked[key] = maskStructuredLog(value, patterns);
}
}
return masked;
}
return data;
}4. Configuration UI
// PII Masking settings page
function PIIMaskingSettings() {
const [patterns, setPatterns] = useState<PIIPattern[]>([]);
const [testLog, setTestLog] = useState('');
const [maskedPreview, setMaskedPreview] = useState('');
function addPattern(pattern: PIIPattern) {
setPatterns([...patterns, pattern]);
}
function testMasking() {
const masked = new PIIMasker({ patterns }).maskMessage(testLog);
setMaskedPreview(masked);
}
return (
<div>
<h2>PII Masking Configuration</h2>
<PatternList
patterns={patterns}
onAdd={addPattern}
onRemove={removePattern}
/>
<TestPanel>
<label>Test Input:</label>
<textarea
value={testLog}
onChange={(e) => setTestLog(e.target.value)}
placeholder="Paste a log entry to test masking..."
/>
<button onClick={testMasking}>Test Masking</button>
<label>Masked Output:</label>
<pre>{maskedPreview}</pre>
</TestPanel>
</div>
);
}5. Performance optimization
// Cache compiled regexes
const regexCache = new Map<string, RegExp>();
// Batch processing for high throughput
async function maskBatch(entries: LogEntry[]): Promise<LogEntry[]> {
return Promise.all(entries.map(entry => maskEntry(entry)));
}
// Skip masking for non-sensitive sources
if (source.skipPIIMasking) {
return entry; // Trust internal logs, skip expensive regex
}Database schema:
-- Track masking metadata
CREATE TABLE pii_masking_stats (
date DATE NOT NULL,
pattern_type VARCHAR(50),
occurrences INTEGER,
source_id UUID REFERENCES sources(id),
PRIMARY KEY (date, pattern_type, source_id)
);
-- For compliance auditing
CREATE TABLE pii_masking_audit (
id UUID PRIMARY KEY,
timestamp TIMESTAMPTZ DEFAULT NOW(),
pattern_type VARCHAR(50),
action VARCHAR(20), -- 'mask', 'redact', 'hash'
source_id UUID,
log_id UUID
);Priority
- Critical - Blocking my usage of LogTide
- High - Would significantly improve my workflow
- Medium - Nice to have
- Low - Minor enhancement
Rationale: Essential for GDPR compliance and enterprise adoption, but not blocking for most current users. Higher priority for EU market and regulated industries.
Target Users
- DevOps Engineers (enforce compliance)
- Developers (prevent accidental PII logging)
- Security/SIEM Users (data protection)
- System Administrators
- All Users
Primary benefit: Organizations that handle user data and need GDPR/compliance guarantees.
Secondary benefit: Reduces data breach risk for everyone.
Additional Context
Why this is critical for growth:
1. GDPR compliance requirement
GDPR Article 5: Data minimization
→ "Personal data shall be adequate, relevant and limited to what is necessary"
→ Storing PII in logs violates this unless there's a specific reason
GDPR fines:
→ Up to €20 million or 4% of annual revenue (whichever is higher)
→ Real example: British Airways fined £20M for data breach
2. Market differentiation
Competitors:
• Datadog: Client-side masking only (can be bypassed)
• Elastic: Manual configuration (complex, error-prone)
• Splunk: Has PII detection but enterprise-tier only
• Grafana Loki: No built-in PII masking
Logtide advantage:
✓ Built-in, automatic detection
✓ Configurable patterns
✓ Works with any log source
✓ Free tier includes PII masking (not paywalled)
3. Enterprise sales enabler
Common enterprise question: "How do you handle PII?"
Without this feature:
❌ "You'll need to configure client-side masking in your SDKs"
→ Enterprise: "That's not acceptable" (lost deal)
With this feature:
✓ "Logtide automatically detects and masks PII at ingestion"
✓ "GDPR-compliant out of the box"
✓ "No code changes required"
→ Enterprise: "Perfect, when can we start?" (closed deal)
Real-world impact examples:
Example 1: Startup avoids GDPR violation
Scenario: Developer accidentally logs user emails in error messages
Without masking: GDPR violation, potential €50k fine
With masking: Emails automatically redacted, no violation
Example 2: Security breach damage limitation
Scenario: Attacker gains access to log database
Without masking: Full credit card numbers, emails, SSNs exposed
With masking: Only masked/hashed data visible (useless to attacker)
Marketing angles:
"GDPR-compliant by default. Logtide automatically protects sensitive data in your logs."
"Stop worrying about PII in logs. Logtide masks emails, credit cards, and phone numbers before storage."
Documentation needs:
- PII masking configuration guide
- GDPR compliance whitepaper
- Best practices for sensitive data
- Custom pattern examples
- Performance impact notes
Blog post opportunity:
"The Hidden GDPR Risk in Your Logs (And How to Fix It)"
- Explain common PII logging mistakes
- Show GDPR requirements
- Demo Logtide's automatic masking
- Position as privacy-first
Future enhancements:
- ML-based PII detection (detect new patterns automatically)
- Industry-specific patterns (healthcare, finance)
- Compliance reporting ("X emails masked this month")
- Integration with DLP tools
- Regional pattern variants (EU vs US phone numbers)
Contribution
- I would like to work on implementing this feature