A security tool that scans Git repository commit history for exposed secrets, API keys, credentials, and other sensitive data using multi-layer detection (regex patterns + entropy analysis + LLM verification).
-
Multi-Layer Detection
- Layer 1: Regex pattern matching (100+ secret types)
- Layer 2: Shannon entropy analysis for cryptographic randomness
- Layer 3: LLM-powered context-aware verification to reduce false positives
-
Coverage
- AWS, GitHub, Google Cloud credentials
- Database connection strings (PostgreSQL, MongoDB, MySQL, Redis)
- API keys (Stripe, Slack, Twilio, SendGrid, and 100+ more)
- Private keys (RSA, SSH, PGP)
- OAuth tokens, JWT, Bearer tokens
-
False Positive Filtering
- Detects test data, examples, and placeholder values
- Entropy calculation filters low-randomness strings
- LLM provides contextual analysis of each finding
-
Git Integration
- Supports both local and remote repositories
- Analyzes commit diffs line-by-line
- Tracks findings with commit metadata
-
Reporting
- Detailed JSON output with confidence scores
- Includes LLM reasoning for each finding
- Terminal summary for quick review
- Python 3.7+
- Git installed on your system
- Hugging Face account (for LLM API)
- Clone the repository
git clone https://github.com/ahossu/git-secret-scanner.git
cd git-secret-scanner- Install dependencies
pip install -r requirements.txt- Get a Hugging Face API token
- Visit: https://huggingface.co/settings/tokens
- Create a new token with read permissions
- You'll need an endpoint URL for an LLM model (e.g., using Hugging Face Inference Endpoints)
python scan.py --repo <REPO_URL_OR_PATH> \
--base-url <HF_ENDPOINT_URL> \
--hf-token <YOUR_HF_TOKEN> \
--n <NUM_COMMITS> \
--out <OUTPUT_FILE>Scan a remote repository (last 20 commits)
python scan.py --repo https://github.com/user/repo.git \
--base-url "https://your-endpoint.huggingface.cloud/v1/" \
--hf-token "hf_yourtoken123" \
--n 20 \
--out report.jsonScan a local repository
python scan.py --repo /path/to/local/repo \
--base-url "https://your-endpoint.huggingface.cloud/v1/" \
--hf-token "hf_yourtoken123" \
--n 50 \
--out findings.jsonTest with GitGuardian's sample secrets repository
python scan.py --repo https://github.com/GitGuardian/sample_secrets.git \
--base-url "https://your-endpoint.huggingface.cloud/v1/" \
--hf-token "hf_yourtoken123" \
--n 50 \
--out report.jsonThe tool generates a JSON report with:
{
"scan_metadata": {
"timestamp": "2025-10-23T16:45:25.047481",
"total_findings": 6,
"high_confidence_findings": 6
},
"findings": [
{
"type": "Database Password",
"matched_text": "pass=\"sup3rstr0ngpass1ForGG\"",
"line_number": 6,
"entropy": 3.699513850319966,
"likely_false_positive": false,
"detection_method": "regex",
"llm_analysis": {
"is_real": true,
"confidence": 0.95,
"reasoning": "The password 'sup3rstr0ngpass1ForGG' has a low entropy of 3.70, which indicates a weak password. Additionally, the presence of a password in a configuration file, especially one related to a database connection, is a strong indicator that this is a production secret."
},
"commit_hash": "d95287b420366311433f4610b94a2c0844f4dce3",
"commit_message": "chore: add postgres connection information",
"commit_author": "Henri Hubert",
"commit_date": "2021-01-12 17:45:52+01:00",
"file_path": "postgres_model.js",
"diff_line": "+var pg_pass=\"sup3rstr0ngpass1ForGG\";",
"context_snippet": "@@ -0,0 +1,7 @@\n+\n+var pg_port=1212;\n+var pg_host=\"gitguardians.com:9082/BLUDB\";\n+var pg_user=\"root\";\n+var pg_pass=\"sup3rstr0ngpass1ForGG\";\n+\n+var mongo_uri = \"mongodb+srv://testuser:hub24aoeu@gg-is-awesome-gg273.mongodb.net/test?retryWrites=true&w=majority\";\n"
}
]
}Cloud Provider Keys (Click to expand)
- AWS Access Keys, Secret Keys, Session Tokens
- Google API Keys, OAuth Tokens, Cloud API Keys
- Azure Storage Keys, SAS Tokens, AD Secrets
- DigitalOcean, Linode, Vultr API Keys
API Tokens & Keys
- GitHub (Personal Access Tokens, OAuth, App Tokens)
- Slack (Tokens, Webhooks)
- Stripe (API Keys, Publishable Keys)
- Twilio, SendGrid, Mailgun, Mailchimp
- And 80+ more services...
Database Credentials
- PostgreSQL, MySQL, MongoDB, Redis
- Cassandra, Elasticsearch, SQL Server
- Connection strings with embedded passwords
Cryptographic Keys
- RSA, EC, PGP Private Keys
- SSH Private/Public Keys
- SSL Certificates, PKCS8 Keys
- OpenVPN Static Keys
1. REGEX SCAN
↓ Matches 100+ patterns
2. ENTROPY ANALYSIS
↓ Calculates Shannon entropy (H > 4.0 = likely secret)
3. FALSE POSITIVE FILTER
↓ Checks for test data indicators
4. LLM VERIFICATION
↓ Context-aware analysis
5. FINAL REPORT
↓ High-confidence findings only
[*] Initializing LLM client...
[✓] LLM client ready
[*] Scanning repository: https://github.com/GitGuardian/sample_secrets.git (last 50 commits)
[*] Cloning repository...
[✓] Repository cloned to temporary directory
[✓] Found 7 commits
[*] Scanning: abc12345 - Initial commit
[✓] Found 12 potential secrets
[*] Scanning: def67890 - Add configuration files
[✓] Found 8 potential secrets
...
[✓] Scan complete: 6/46 high-confidence findings
[✓] Report generated: report.json
Scan Summary:
Repository: https://github.com/GitGuardian/sample_secrets.git
Commits analyzed: 50
Secrets identified: 6
Report: report.json
MIT License - See LICENSE file for details
Alexandru Hossu