Git Secret Scanner

A security tool that scans Git repository commit history for exposed secrets, API keys, credentials, and other sensitive data using multi-layer detection (regex patterns + entropy analysis + LLM verification).

Features

Multi-Layer Detection
- Layer 1: Regex pattern matching (100+ secret types)
- Layer 2: Shannon entropy analysis for cryptographic randomness
- Layer 3: LLM-powered context-aware verification to reduce false positives
Coverage
- AWS, GitHub, Google Cloud credentials
- Database connection strings (PostgreSQL, MongoDB, MySQL, Redis)
- API keys (Stripe, Slack, Twilio, SendGrid, and 100+ more)
- Private keys (RSA, SSH, PGP)
- OAuth tokens, JWT, Bearer tokens
False Positive Filtering
- Detects test data, examples, and placeholder values
- Entropy calculation filters low-randomness strings
- LLM provides contextual analysis of each finding
Git Integration
- Supports both local and remote repositories
- Analyzes commit diffs line-by-line
- Tracks findings with commit metadata
Reporting
- Detailed JSON output with confidence scores
- Includes LLM reasoning for each finding
- Terminal summary for quick review

Installation

Prerequisites

Python 3.7+
Git installed on your system
Hugging Face account (for LLM API)

Setup

Clone the repository

git clone https://github.com/ahossu/git-secret-scanner.git
cd git-secret-scanner

Install dependencies

pip install -r requirements.txt

Get a Hugging Face API token
- Visit: https://huggingface.co/settings/tokens
- Create a new token with read permissions
- You'll need an endpoint URL for an LLM model (e.g., using Hugging Face Inference Endpoints)

Usage

Basic Command Structure

python scan.py --repo <REPO_URL_OR_PATH> \
               --base-url <HF_ENDPOINT_URL> \
               --hf-token <YOUR_HF_TOKEN> \
               --n <NUM_COMMITS> \
               --out <OUTPUT_FILE>

Examples

Scan a remote repository (last 20 commits)

python scan.py --repo https://github.com/user/repo.git \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 20 \
               --out report.json

Scan a local repository

python scan.py --repo /path/to/local/repo \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 50 \
               --out findings.json

Test with GitGuardian's sample secrets repository

python scan.py --repo https://github.com/GitGuardian/sample_secrets.git \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 50 \
               --out report.json

Output Format

The tool generates a JSON report with:

{
  "scan_metadata": {
    "timestamp": "2025-10-23T16:45:25.047481",
    "total_findings": 6,
    "high_confidence_findings": 6
  },
  "findings": [
    {
      "type": "Database Password",
      "matched_text": "pass=\"sup3rstr0ngpass1ForGG\"",
      "line_number": 6,
      "entropy": 3.699513850319966,
      "likely_false_positive": false,
      "detection_method": "regex",
      "llm_analysis": {
        "is_real": true,
        "confidence": 0.95,
        "reasoning": "The password 'sup3rstr0ngpass1ForGG' has a low entropy of 3.70, which indicates a weak password. Additionally, the presence of a password in a configuration file, especially one related to a database connection, is a strong indicator that this is a production secret."
      },
      "commit_hash": "d95287b420366311433f4610b94a2c0844f4dce3",
      "commit_message": "chore: add postgres connection information",
      "commit_author": "Henri Hubert",
      "commit_date": "2021-01-12 17:45:52+01:00",
      "file_path": "postgres_model.js",
      "diff_line": "+var pg_pass=\"sup3rstr0ngpass1ForGG\";",
      "context_snippet": "@@ -0,0 +1,7 @@\n+\n+var pg_port=1212;\n+var pg_host=\"gitguardians.com:9082/BLUDB\";\n+var pg_user=\"root\";\n+var pg_pass=\"sup3rstr0ngpass1ForGG\";\n+\n+var mongo_uri = \"mongodb+srv://testuser:hub24aoeu@gg-is-awesome-gg273.mongodb.net/test?retryWrites=true&w=majority\";\n"
    }
  ]
}

Supported Secret Types

Cloud Provider Keys (Click to expand)

AWS Access Keys, Secret Keys, Session Tokens
Google API Keys, OAuth Tokens, Cloud API Keys
Azure Storage Keys, SAS Tokens, AD Secrets
DigitalOcean, Linode, Vultr API Keys

API Tokens & Keys

GitHub (Personal Access Tokens, OAuth, App Tokens)
Slack (Tokens, Webhooks)
Stripe (API Keys, Publishable Keys)
Twilio, SendGrid, Mailgun, Mailchimp
And 80+ more services...

Database Credentials

PostgreSQL, MySQL, MongoDB, Redis
Cassandra, Elasticsearch, SQL Server
Connection strings with embedded passwords

Cryptographic Keys

RSA, EC, PGP Private Keys
SSH Private/Public Keys
SSL Certificates, PKCS8 Keys
OpenVPN Static Keys

How It Works

Detection Pipeline

1. REGEX SCAN
   ↓ Matches 100+ patterns
   
2. ENTROPY ANALYSIS
   ↓ Calculates Shannon entropy (H > 4.0 = likely secret)
   
3. FALSE POSITIVE FILTER
   ↓ Checks for test data indicators
   
4. LLM VERIFICATION
   ↓ Context-aware analysis
   
5. FINAL REPORT
   ↓ High-confidence findings only

Example Terminal Output

[*] Initializing LLM client...
[✓] LLM client ready
[*] Scanning repository: https://github.com/GitGuardian/sample_secrets.git (last 50 commits)
[*] Cloning repository...
[✓] Repository cloned to temporary directory
[✓] Found 7 commits
[*] Scanning: abc12345 - Initial commit
[✓] Found 12 potential secrets
[*] Scanning: def67890 - Add configuration files
[✓] Found 8 potential secrets
...
[✓] Scan complete: 6/46 high-confidence findings
[✓] Report generated: report.json

Scan Summary:
Repository: https://github.com/GitGuardian/sample_secrets.git
Commits analyzed: 50
Secrets identified: 6
Report: report.json

License

MIT License - See LICENSE file for details

Author

Alexandru Hossu

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
LICENSE		LICENSE
PROJECT_STRUCTURE.txt		PROJECT_STRUCTURE.txt
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements.txt		requirements.txt
scan.py		scan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Git Secret Scanner

Features

Installation

Prerequisites

Setup

Usage

Basic Command Structure

Examples

Output Format

Supported Secret Types

How It Works

Detection Pipeline

Example Terminal Output

License

Author

About

Uh oh!

Languages

License

ahossu/git-secret-scanner

Folders and files

Latest commit

History

Repository files navigation

Git Secret Scanner

Features

Installation

Prerequisites

Setup

Usage

Basic Command Structure

Examples

Output Format

Supported Secret Types

How It Works

Detection Pipeline

Example Terminal Output

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages