Skip to content

Multi-layer Git secret scanner using regex + entropy analysis + LLM verification to detect exposed API keys, credentials, and sensitive data in commit history. Built with Python for security auditing and DevSecOps workflows

License

Notifications You must be signed in to change notification settings

ahossu/git-secret-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Git Secret Scanner

A security tool that scans Git repository commit history for exposed secrets, API keys, credentials, and other sensitive data using multi-layer detection (regex patterns + entropy analysis + LLM verification).


Features

  • Multi-Layer Detection

    • Layer 1: Regex pattern matching (100+ secret types)
    • Layer 2: Shannon entropy analysis for cryptographic randomness
    • Layer 3: LLM-powered context-aware verification to reduce false positives
  • Coverage

    • AWS, GitHub, Google Cloud credentials
    • Database connection strings (PostgreSQL, MongoDB, MySQL, Redis)
    • API keys (Stripe, Slack, Twilio, SendGrid, and 100+ more)
    • Private keys (RSA, SSH, PGP)
    • OAuth tokens, JWT, Bearer tokens
  • False Positive Filtering

    • Detects test data, examples, and placeholder values
    • Entropy calculation filters low-randomness strings
    • LLM provides contextual analysis of each finding
  • Git Integration

    • Supports both local and remote repositories
    • Analyzes commit diffs line-by-line
    • Tracks findings with commit metadata
  • Reporting

    • Detailed JSON output with confidence scores
    • Includes LLM reasoning for each finding
    • Terminal summary for quick review

Installation

Prerequisites

  • Python 3.7+
  • Git installed on your system
  • Hugging Face account (for LLM API)

Setup

  1. Clone the repository
git clone https://github.com/ahossu/git-secret-scanner.git
cd git-secret-scanner
  1. Install dependencies
pip install -r requirements.txt
  1. Get a Hugging Face API token

Usage

Basic Command Structure

python scan.py --repo <REPO_URL_OR_PATH> \
               --base-url <HF_ENDPOINT_URL> \
               --hf-token <YOUR_HF_TOKEN> \
               --n <NUM_COMMITS> \
               --out <OUTPUT_FILE>

Examples

Scan a remote repository (last 20 commits)

python scan.py --repo https://github.com/user/repo.git \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 20 \
               --out report.json

Scan a local repository

python scan.py --repo /path/to/local/repo \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 50 \
               --out findings.json

Test with GitGuardian's sample secrets repository

python scan.py --repo https://github.com/GitGuardian/sample_secrets.git \
               --base-url "https://your-endpoint.huggingface.cloud/v1/" \
               --hf-token "hf_yourtoken123" \
               --n 50 \
               --out report.json

Output Format

The tool generates a JSON report with:

{
  "scan_metadata": {
    "timestamp": "2025-10-23T16:45:25.047481",
    "total_findings": 6,
    "high_confidence_findings": 6
  },
  "findings": [
    {
      "type": "Database Password",
      "matched_text": "pass=\"sup3rstr0ngpass1ForGG\"",
      "line_number": 6,
      "entropy": 3.699513850319966,
      "likely_false_positive": false,
      "detection_method": "regex",
      "llm_analysis": {
        "is_real": true,
        "confidence": 0.95,
        "reasoning": "The password 'sup3rstr0ngpass1ForGG' has a low entropy of 3.70, which indicates a weak password. Additionally, the presence of a password in a configuration file, especially one related to a database connection, is a strong indicator that this is a production secret."
      },
      "commit_hash": "d95287b420366311433f4610b94a2c0844f4dce3",
      "commit_message": "chore: add postgres connection information",
      "commit_author": "Henri Hubert",
      "commit_date": "2021-01-12 17:45:52+01:00",
      "file_path": "postgres_model.js",
      "diff_line": "+var pg_pass=\"sup3rstr0ngpass1ForGG\";",
      "context_snippet": "@@ -0,0 +1,7 @@\n+\n+var pg_port=1212;\n+var pg_host=\"gitguardians.com:9082/BLUDB\";\n+var pg_user=\"root\";\n+var pg_pass=\"sup3rstr0ngpass1ForGG\";\n+\n+var mongo_uri = \"mongodb+srv://testuser:hub24aoeu@gg-is-awesome-gg273.mongodb.net/test?retryWrites=true&w=majority\";\n"
    }
  ]
}

Supported Secret Types

Cloud Provider Keys (Click to expand)
  • AWS Access Keys, Secret Keys, Session Tokens
  • Google API Keys, OAuth Tokens, Cloud API Keys
  • Azure Storage Keys, SAS Tokens, AD Secrets
  • DigitalOcean, Linode, Vultr API Keys
API Tokens & Keys
  • GitHub (Personal Access Tokens, OAuth, App Tokens)
  • Slack (Tokens, Webhooks)
  • Stripe (API Keys, Publishable Keys)
  • Twilio, SendGrid, Mailgun, Mailchimp
  • And 80+ more services...
Database Credentials
  • PostgreSQL, MySQL, MongoDB, Redis
  • Cassandra, Elasticsearch, SQL Server
  • Connection strings with embedded passwords
Cryptographic Keys
  • RSA, EC, PGP Private Keys
  • SSH Private/Public Keys
  • SSL Certificates, PKCS8 Keys
  • OpenVPN Static Keys

How It Works

Detection Pipeline

1. REGEX SCAN
   ↓ Matches 100+ patterns
   
2. ENTROPY ANALYSIS
   ↓ Calculates Shannon entropy (H > 4.0 = likely secret)
   
3. FALSE POSITIVE FILTER
   ↓ Checks for test data indicators
   
4. LLM VERIFICATION
   ↓ Context-aware analysis
   
5. FINAL REPORT
   ↓ High-confidence findings only

Example Terminal Output

[*] Initializing LLM client...
[✓] LLM client ready
[*] Scanning repository: https://github.com/GitGuardian/sample_secrets.git (last 50 commits)
[*] Cloning repository...
[✓] Repository cloned to temporary directory
[✓] Found 7 commits
[*] Scanning: abc12345 - Initial commit
[✓] Found 12 potential secrets
[*] Scanning: def67890 - Add configuration files
[✓] Found 8 potential secrets
...
[✓] Scan complete: 6/46 high-confidence findings
[✓] Report generated: report.json

Scan Summary:
Repository: https://github.com/GitGuardian/sample_secrets.git
Commits analyzed: 50
Secrets identified: 6
Report: report.json

License

MIT License - See LICENSE file for details


Author

Alexandru Hossu

About

Multi-layer Git secret scanner using regex + entropy analysis + LLM verification to detect exposed API keys, credentials, and sensitive data in commit history. Built with Python for security auditing and DevSecOps workflows

Topics

Resources

License

Stars

Watchers

Forks

Languages