|
| 1 | +--- |
| 2 | +version: master |
| 3 | +has_magic_breadcrumbs: true |
| 4 | +show_category_breadcrumb: true |
| 5 | +show_title_breadcrumb: true |
| 6 | +category: 'Developers Guide' |
| 7 | +title: 'Security Token Scanner' |
| 8 | +source_url: 'https://github.com/metabase/metabase/blob/master/docs/developers-guide/security-token-scanner.md' |
| 9 | +layout: new-docs |
| 10 | +--- |
| 11 | + |
| 12 | +# Security Token Scanner |
| 13 | + |
| 14 | +The security token scanner is a tool that automatically detects potentially leaked API keys, secrets, and other sensitive tokens in the Metabase codebase. It runs as a git precommit hook via `lint-staged` to prevent accidental token leaks from being committed. |
| 15 | + |
| 16 | +## What it scans for |
| 17 | + |
| 18 | +The scanner looks for patterns that match common token formats: |
| 19 | + |
| 20 | +- **Airgap Tokens**: JWE tokens starting with `airgap_` (400+ characters) |
| 21 | +- **Hash/Dev Tokens**: 64-character hex strings or `mb_dev_` prefixed tokens |
| 22 | +- **OpenAI API Keys**: Keys starting with `sk-` (43-51 characters total) |
| 23 | +- **JWT Tokens**: Standard JWT format with header.payload.signature |
| 24 | +- **JWE Tokens**: Encrypted JWT tokens (400+ characters) |
| 25 | +- **GitHub Tokens**: Personal access tokens starting with `gh[pousr]_` |
| 26 | +- **Slack Bot Tokens**: Bot tokens starting with `xoxb-` |
| 27 | +- **AWS Access Keys**: Access key IDs starting with `AKIA` |
| 28 | + |
| 29 | +## Running the scanner |
| 30 | + |
| 31 | +The scanner runs automatically via `lint-staged` on staged files during git commits. You can also run it directly from mage. |
| 32 | + |
| 33 | +### Basic usage |
| 34 | + |
| 35 | +```bash |
| 36 | +# Scan specific files |
| 37 | +./bin/mage token-scan file1.txt file2.txt |
| 38 | + |
| 39 | +# Scan all files in the project |
| 40 | +./bin/mage token-scan -a |
| 41 | + |
| 42 | +# Run with verbose output |
| 43 | +./bin/mage token-scan -v file1.txt file2.txt |
| 44 | + |
| 45 | +# Scan without showing line details |
| 46 | +./bin/mage token-scan --no-lines file1.txt file2.txt |
| 47 | +``` |
| 48 | + |
| 49 | +### Example output |
| 50 | + |
| 51 | +``` |
| 52 | +Scanning 143 files |
| 53 | +Using thread pool size: 16 |
| 54 | +/Users/dev/metabase/src/metabase/api/auth.clj |
| 55 | + Line# 42 [OpenAI API Key]: const apiKey = "sk-1234567890abcdef1234567890abcdef123456789012"; |
| 56 | +
|
| 57 | +Scan completed in: 89ms |
| 58 | +Files scanned: 143 |
| 59 | +Files with matches: 1 |
| 60 | +Total matches: 1 |
| 61 | +``` |
| 62 | + |
| 63 | +## Whitelisting legitimate tokens |
| 64 | + |
| 65 | +Sometimes you need to include token-like strings in source code for testing or examples. The scanner uses a whitelist file to avoid flagging known safe tokens. |
| 66 | + |
| 67 | +The whitelist is located at `mage/resources/token_scanner_whitelist.txt` and contains strings that should not be flagged as secrets: |
| 68 | + |
| 69 | +``` |
| 70 | +# Common test/example tokens that appear in documentation |
| 71 | +sk-1234567890abcdef1234567890abcdef123456789012 |
| 72 | +eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c |
| 73 | +
|
| 74 | +# Hash values from tests and examples |
| 75 | +430bb02a37bb2471176e54ca323d0940c4e0ee210c3ab04262cb6576fe4ded6d |
| 76 | +sha256:9ff56186de4dd0b9bb2a37c977c3a4c9358647cde60a16f11f4c05bded1fe77a |
| 77 | +
|
| 78 | +# Slack bot tokens from examples |
| 79 | +xoxb-781236542736-2364535789652-GkwFDQoHqzXDVsC6GzqYUypD |
| 80 | +``` |
| 81 | + |
| 82 | +To whitelist a token, add the exact string to this file. Each line is treated as a substring that will be checked against the entire line containing the token using exact substring matching. |
| 83 | + |
| 84 | +**Important**: The whitelist uses simple substring matching, not regex patterns. Add the exact token string that should be ignored. |
| 85 | + |
| 86 | +## Adding new token patterns |
| 87 | + |
| 88 | +To add a new token pattern, edit `mage/mage/token_scan.clj` and add an entry to the `token-patterns` map: |
| 89 | + |
| 90 | +```clojure |
| 91 | +(def ^:private token-patterns |
| 92 | + {"Existing Pattern" #"existing-regex" |
| 93 | + "Your New Token Type" #"13{2}7"}) |
| 94 | +``` |
| 95 | + |
| 96 | +### Pattern guidelines |
| 97 | + |
| 98 | +- **Be specific**: Patterns should match the actual token format, not environment variable assignments |
| 99 | +- **Include length constraints**: Use `{min,max}` quantifiers to avoid false positives |
| 100 | +- **Add comments**: Explain the token format and expected length |
| 101 | +- **Test thoroughly**: Run the scanner on the codebase to check for false positives |
| 102 | + - Run it on everything with: `mage token-scan -a` |
| 103 | + |
| 104 | +Example of a good pattern: |
| 105 | +```clojure |
| 106 | +"Stripe API Key" #"sk_live_[A-Za-z0-9]{24}" ;; Stripe live keys: sk_live_ + 24 chars |
| 107 | +``` |
| 108 | + |
| 109 | +## Modifying file filtering |
| 110 | + |
| 111 | +The scanner excludes certain files to avoid false positives from generated content. To modify the filtering, edit the `exclude-path-str?` function in `mage/mage/token_scan.clj`: |
| 112 | + |
| 113 | +```clojure |
| 114 | +(defn- exclude-path-str? |
| 115 | + "Check if a file should be excluded from scanning" |
| 116 | + [path-str] |
| 117 | + (or |
| 118 | + ;; Existing exclusions |
| 119 | + (str/includes? path-str "/.git/") |
| 120 | + (str/includes? path-str "/node_modules/") |
| 121 | + |
| 122 | + ;; Add new exclusions |
| 123 | + (str/includes? path-str "/my-generated-dir/") |
| 124 | + (str/ends-with? path-str ".generated.js"))) |
| 125 | +``` |
| 126 | + |
| 127 | +### Common exclusions |
| 128 | + |
| 129 | +The scanner currently excludes: |
| 130 | +- **Build directories**: `target/`, `node_modules/`, `.git/` |
| 131 | +- **Generated files**: `*.bundle.js`, `*.min.js`, `*.map` |
| 132 | +- **Binary files**: `*.jar`, `*.class`, `*.so`, `*.dll` |
| 133 | +- **Media files**: `*.png`, `*.jpg`, `*.svg` |
| 134 | +- **Test data**: `/stories-data/`, `/test-data/`, `/fixtures/` |
| 135 | +- **Checksum files**: `SHA256.sum`, `*.sha256`, `*.md5` |
| 136 | + |
| 137 | +## Git Hook Integration |
| 138 | + |
| 139 | +The scanner runs automatically as a git precommit hook. If it finds tokens or unused ignore comments, the commit will be blocked with: |
| 140 | + |
| 141 | +- **Token detected**: Review the file to ensure it's not a real secret |
| 142 | + |
| 143 | + |
| 144 | +The scanner only scans files that are staged for commit, making it fast and focused on new changes. |
| 145 | + |
| 146 | +## Troubleshooting |
| 147 | + |
| 148 | +### False positives |
| 149 | + |
| 150 | +If the scanner flags legitimate code: |
| 151 | + |
| 152 | +1. **Add to whitelist** if it's a test token or example (edit `token_scanner_whitelist.txt`) |
| 153 | +2. **Refine the pattern** if it's too broad (edit `token-patterns`) |
| 154 | +3. **Exclude the file type** if it's generated content (edit `exclude-path-str?`) |
| 155 | + |
| 156 | +### Performance issues |
| 157 | + |
| 158 | +The scanner uses parallel processing and should complete in under 5 seconds for most commits. If it's slow: |
| 159 | + |
| 160 | +1. Check if too many files are being scanned (`-v` flag shows file list) |
| 161 | +2. Consider excluding large generated directories |
| 162 | +3. Patterns with broad wildcards (like `.*`) can be slow |
| 163 | + |
| 164 | +### Bypassing the hook |
| 165 | + |
| 166 | +If you need to bypass the scanner for a specific commit (not recommended): |
| 167 | + |
| 168 | +```bash |
| 169 | +git commit --no-verify -m "commit message" |
| 170 | +``` |
| 171 | + |
| 172 | +Use this sparingly and only when absolutely necessary. |
| 173 | + |
| 174 | +### Getting help |
| 175 | + |
| 176 | +For issues with the scanner: |
| 177 | + |
| 178 | +1. Check the git hook output for detailed error messages |
| 179 | +2. Run the scanner locally to debug: `./bin/mage token-scan -v file1.txt file2.txt` |
| 180 | +3. Ask in the #security or #dev channels for help with patterns or exclusions |
0 commit comments