ABScanner is a focused Python-based sensitive data detection tool designed to recursively scan files and directories for critical security vulnerabilities: username/password pairs and credit card numbers. The scanner uses advanced regex patterns with sophisticated false positive filtering and supports multiple file formats including text files, Microsoft Office documents, PDFs, and various configuration files.
Avoid scanning code files, while this automatically ignore environments and node_modules, code containing keywords like "password" or "auth" or "username" will be hit.
If ommiting code files is not an option, feel free to update the config/patterns.json and check out the src/patterns.py
- Multi-format Support: Scans text files, .docx, .xlsx, .pptx, .pdf, and many other file formats
- Focused Detection: Specifically targets username/password pairs and credit card numbers
- Advanced False Positive Filtering: Eliminates system constants, hex values, and programming constructs
- Smart Directory Filtering: Automatically skips node_modules, env, .git, and build directories
- Flexible Output: Supports text, JSON, and CSV output formats
- Colorized Console Output: Easy-to-read results with color-coded findings
- Configurable Patterns: Customizable regex patterns via JSON configuration
- Performance Optimized: Handles large directory structures efficiently
ABScanner is specifically designed to detect the most critical credential exposures:
-
π Credentials: Username/password pairs in various formats
username="value"/password="value"assignments- Connection string credentials
- Configuration file credentials
- Environment variable patterns
-
π³ Credit Cards: Valid credit card numbers
- Visa, MasterCard, American Express, Discover
- Luhn algorithm validation
- Multiple format support (with/without spaces/dashes)
ABScanner includes sophisticated filtering to eliminate common false positives:
- System constants and hex values
- Programming language constructs
- Variable assignments vs. actual credentials
- Test data and placeholder values
- Code comments and documentation
Windows:
- Download the Windows package from releases
- Extract and run
install.batas Administrator - Restart command prompt
- Use
abscannerfrom anywhere
Linux (Ubuntu/Debian):
# Download and install .deb package
sudo dpkg -i abscanner_1.0.0_all.debLinux (RedHat/CentOS/Fedora):
# Install .rpm package
sudo rpm -i abscanner-1.0.0-1.noarch.rpmLinux (Generic):
# Extract and install
tar -xzf abscanner-linux-generic.tar.gz
cd abscanner-linux
sudo ./install.sh- Clone or download the ABScanner project
- Navigate to the project directory:
cd ABScanner - Install dependencies:
pip install -r requirements.txt
pip install abscanner# Basic scan
abscanner /path/to/scan
# Save results to file
abscanner /path/to/scan --output results.txt
# JSON output format
abscanner /path/to/scan --output results.json --format json
# Get help
abscanner --helppython main.py <directory_path>python main.py <directory_path> --output results.txtpython main.py <directory_path> --output results.json --format jsonpython main.py <directory_path> --output results.csv --format csvpython main.py <directory_path> --no-colorspython main.py <directory_path> --verboseπ Starting ABScanner...
π Scanning directory: /path/to/scan
β° Started at: 2025-08-12 14:22:34
------------------------------------------------------------
Scanning directory: /path/to/scan
------------------------------------------------------------
β
Scan completed! Found 676 credential findings after filtering.
SUMMARY:
Total findings: 676
Files affected: 10
Findings by category:
- credentials: 676
- credit_cards: 0
π― FOCUSED RESULTS:
Real credentials found: 1 unique username/password pair
Files with credentials: 7 Python scripts
Risk level: HIGH (Hardcoded credentials detected)
The regex patterns used for detecting sensitive information can be customized in the config/patterns.json file. The configuration focuses on credentials and credit cards:
{
"patterns": {
"credentials": [
"(?i)(username|user|login)\\s*[:=]\\s*['\"]?([^\\s'\"]+)",
"(?i)(password|pass|pwd)\\s*[:=]\\s*['\"]?([^\\s'\"]+)"
],
"credit_cards": [
"\\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\\b"
]
}
}ABScanner includes specialized extraction tools:
- extract_real_credentials.py: Ultra-focused credential extraction with advanced filtering
- summary_report.py: Generates clean summary reports of findings
- FINAL_RESULTS.txt: Human-readable summary of scan results
ABScanner/
βββ main.py # Main CLI entry point
βββ README.md # This documentation
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup configuration
βββ FINAL_RESULTS.txt # Example scan results
βββ config/
β βββ patterns.json # Regex pattern definitions
βββ src/
β βββ __init__.py
β βββ scanner.py # Core scanning engine
β βββ patterns.py # Pattern management & filtering
β βββ utils.py # File processing utilities
β βββ reporter.py # Report generation
βββ tests/
β βββ __init__.py
β βββ test_scanner.py # Scanner tests
β βββ test_patterns.py # Pattern tests
β βββ test_data/ # Test files
βββ output/ # Default output directory
To create distributable packages:
Windows:
# Install PyInstaller
pip install pyinstaller
# Build Windows executable
build.bat windows
# Or build everything
build.bat allLinux:
# Build Linux packages (.deb, .rpm, .tar.gz)
make linux
# Or build everything
make allSee PACKAGING.md for detailed build instructions.
Run the test suite:
pytest- Text Files: .txt, .py, .js, .html, .css, .json, .xml, .yaml, .yml, .md, .rst, .csv, .log, .cfg, .ini, .conf
- Scripts: .sh, .bat, .ps1, .sql, .php, .jsp, .asp
- Programming: .java, .cpp, .c, .h, .cs, .vb, .rb, .go, .swift, .kt, .scala, .pl, .r, .m
- Documents: .docx, .xlsx, .pptx, .pdf
- Archives: Limited text extraction from certain formats
- python-docx: For Microsoft Word document processing
- openpyxl: For Excel file processing
- python-pptx: For PowerPoint file processing
- PyPDF2: For PDF text extraction
- colorama: For colored console output
- Standard library: json, os, re, argparse, etc.
- False Positive Filtering: ABScanner includes advanced filtering to minimize false positives from system constants, programming constructs, and test data.
- Performance: Large directories are handled efficiently with smart directory filtering (skips node_modules, env, .git, etc.).
- Privacy: Scan results contain actual sensitive information - handle with appropriate security measures.
- Focused Scope: ABScanner specifically targets username/password pairs and credit cards for maximum security impact.
- Permissions: Ensure you have appropriate authorization before scanning any systems or directories.
Contributions are welcome! Areas for improvement:
- Additional file format support
- More sophisticated pattern recognition
- Performance optimizations
- Integration with security tools
- Custom reporting formats
- 0: No credentials detected
- 1: Credentials found (security issue detected)
- 130: Scan interrupted by user (Ctrl+C)
- Other: Error occurred during scanning
This project is licensed under the MIT License. Use responsibly and ensure compliance with your organization's security policies.