pii_scanner v3
is a command-line utility written in Go that scans a specified filesystem for files containing personally identifiable information (PII) and other sensitive data. This tool supports multiple file formats (text, SQL, JSON, MySQL dumps) and uses regular expressions to detect specific data patterns such as:
- Email addresses
- Phone numbers (international and domestic)
- Birthdates
- Social Security Numbers (SSNs)
- Credit card numbers (Visa, MasterCard, AMEX)
- Sensitive labels in JSON and SQL files (e.g.,
nationalID
,SSN
) - TLS private keys
Additionally, it can detect obfuscation tags in files.
- File Format Support: Scans
.sql
,.json
,.jsonl
,.txt
, and MySQL dump files. - Regular Expressions: Uses predefined regex patterns to detect sensitive information.
- Concurrency: Files are scanned concurrently for improved performance.
- File Type Identification: Automatically identifies file types (e.g., text, JSON, SQL, MySQL dump) and applies appropriate scanning rules.
- Obfuscation Tag Detection: Identifies obfuscated files containing a specific UUID.
- The program accepts a root directory (
-filesystem
) as input. - It recursively walks through all files in the specified directory.
- Files are categorized by type (e.g., text, JSON, SQL) and scanned accordingly using predefined regular expressions.
- If sensitive data is found, the program outputs a sample of matches from each file.
- The tool can detect obfuscated files based on a predefined UUID.
- Go version 1.18 or higher
-
Clone the repository or download the Go file:
git clone https://github.com/your-username/pii_scanner.git cd pii_scanner
-
Build the executable:
go build -o pii_scanner .
-
Run the scanner with the required
-filesystem
flag:./pii_scanner -filesystem=/path/to/scan
-filesystem
: Specifies the root directory to scan. Required../pii_scanner -filesystem=/home/user/files
When scanning a directory, the output will show samples of sensitive data found in files:
pii_scanner v.3 maintained by kenneth.webster@imperva.com
In file /path/to/file.json, found 3 instances of email address. Sample matches:
Match 1: user@example.com
Match 2: admin@domain.com
Match 3: contact@website.com
In MySQL dump file /path/to/file.sql, found instances of sensitive_sql_column. Sample matches:
Match 1: SSN: '123-45-6789'