Skip to content

[auto-build] git-hook-token-scanner -> master #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11,399 changes: 5,099 additions & 6,300 deletions _docs/master/api.html

Large diffs are not rendered by default.

94 changes: 43 additions & 51 deletions _docs/master/configuring-metabase/config-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ clojure -M:doc:ee config-template

The template lists example `database`, `user`, and `settings` sections for the [config file](./config-file).


```yaml
# A config file template for Metabase.
# You'll need to update (or remove) the `users` and `databases` sections.
Expand All @@ -34,49 +33,48 @@ The template lists example `database`, `user`, and `settings` sections for the [
version: 1
config:
users:
- first_name: First
last_name: Person
password: metabot1
email: first@example.com
- first_name: Normal
last_name: Person
password: metabot1
email: normal@example.com
- first_name: Admin
last_name: Person
password: metabot1
is_superuser: true
email: admin@example.com
- first_name: First
last_name: Person
password: metabot1
email: first@example.com
- first_name: Normal
last_name: Person
password: metabot1
email: normal@example.com
- first_name: Admin
last_name: Person
password: metabot1
is_superuser: true
email: admin@example.com
databases:
- name: Sample PostgreSQL
engine: postgres
details:
host: postgres-data
port: 5432
user: metabase
password: metasample123
dbname: sample
- name: Sample MySQL
engine: mysql
details:
host: mysql-data
port: 3306
user: metabase
password: metasample123
dbname: sample
- name: Sample PostgreSQL
engine: postgres
details:
host: postgres-data
port: 5432
user: metabase
password: metasample123
dbname: sample
- name: Sample MySQL
engine: mysql
details:
host: mysql-data
port: 3306
user: metabase
password: metasample123
dbname: sample
api-keys:
- name: Admin API key
group: admin
creator: first@example.com
key: mb_firsttestapikey123
- name: All Users API key
group: all-users
creator: first@example.com
key: mb_secondtestapikey456
- name: Admin API key
group: admin
creator: first@example.com
key: mb_firsttestapikey123
- name: All Users API key
group: all-users
creator: first@example.com
key: mb_secondtestapikey456
settings:
admin-email: null
aggregated-query-row-limit: null
ai-service-base-url: http://localhost:8000
allowed-iframe-hosts: |-
youtube.com,
youtu.be,
Expand Down Expand Up @@ -110,6 +108,7 @@ config:
application-name: Metabase
attachment-row-limit: null
attachment-table-row-limit: 20
backfill-entity-ids-repeat-ms: 3000
bcc-enabled: true
breakout-bin-width: 10.0
breakout-bins-num: 8
Expand All @@ -120,7 +119,6 @@ config:
custom-geojson-enabled: true
custom-homepage: false
custom-homepage-dashboard: null
dashboards-save-last-used-parameters: true
db-connection-timeout-ms: 10000
db-query-timeout-minutes: 20
default-maps-enabled: true
Expand Down Expand Up @@ -156,7 +154,7 @@ config:
gsheets: null
health-check-logging-enabled: true
help-link: metabase
help-link-custom-destination: https://www.metabase.com/help/premium
help-link-custom-destination: https://www.metabase.com/help-premium
humanization-strategy: simple
jdbc-data-warehouse-max-connection-pool-size: 15
jwt-attribute-email: email
Expand All @@ -169,7 +167,7 @@ config:
jwt-identity-provider-uri: null
jwt-shared-secret: null
jwt-user-provisioning-enabled: true
landing-page: ''
landing-page: ""
landing-page-illustration: default
landing-page-illustration-custom: null
ldap-attribute-email: mail
Expand All @@ -190,7 +188,6 @@ config:
ldap-user-base: null
ldap-user-filter: (&(objectClass=inetOrgPerson)(|(uid={login})(mail={login})))
ldap-user-provisioning-enabled: true
license-token-missing-banner-dismissal-timestamp: []
loading-message: doing-science
login-page-illustration: default
login-page-illustration-custom: null
Expand All @@ -201,10 +198,8 @@ config:
no-data-illustration-custom: null
no-object-illustration: default
no-object-illustration-custom: null
non-table-chart-generated: false
not-behind-proxy: false
notification-link-base-url: null
notification-system-event-thread-pool-size: 5
notification-thread-pool-size: 3
persisted-model-refresh-cron-schedule: 0 0 0/6 * * ? *
persisted-models-enabled: false
Expand All @@ -222,7 +217,7 @@ config:
saml-application-name: Metabase
saml-attribute-email: null
saml-attribute-firstname: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname
saml-attribute-group: null
saml-attribute-group: member_of
saml-attribute-lastname: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname
saml-enabled: false
saml-group-mappings: {}
Expand All @@ -231,15 +226,13 @@ config:
saml-identity-provider-issuer: null
saml-identity-provider-slo-uri: null
saml-identity-provider-uri: null
saml-keystore-alias: null
saml-keystore-alias: metabase
saml-keystore-password: changeit
saml-keystore-path: null
saml-slo-enabled: false
saml-user-provisioning-enabled: true
scim-enabled: null
sdk-encryption-validation-key: null
search-engine: appdb
search-language: null
search-engine: in-place
search-typeahead-enabled: true
send-new-sso-user-admin-email: null
session-cookie-samesite: lax
Expand Down Expand Up @@ -268,6 +261,5 @@ config:
unaggregated-query-row-limit: null
update-channel: latest
uploads-settings: null
use-tenants: false
user-visibility: all
```
6 changes: 3 additions & 3 deletions _docs/master/configuring-metabase/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,12 @@ Maximum number of rows to return for aggregated queries via the API.

Must be less than 1048575. See also MB_UNAGGREGATED_QUERY_ROW_LIMIT.

### `MB_AI_SERVICE_BASE_URL`
### `MB_AI_PROXY_BASE_URL`

- Type: string
- Default: `http://localhost:8000`

URL for the a AI Service.
URL for the a AI Proxy service.

### `MB_ALLOWED_IFRAME_HOSTS`

Expand Down Expand Up @@ -264,7 +264,7 @@ Row limit in file attachments excluding the header.
Maximum number of rows to render in an alert or subscription image.

Range: 1-100. To limit the total number of rows included in the file attachment
for an email dashboard subscription, use MB_ATTACHMENT_ROW_LIMIT.
for an email dashboard subscription, use MB_UNAGGREGATED_QUERY_ROW_LIMIT.

### `MB_BCC_ENABLED`

Expand Down
180 changes: 180 additions & 0 deletions _docs/master/developers-guide/security-token-scanner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
version: master
has_magic_breadcrumbs: true
show_category_breadcrumb: true
show_title_breadcrumb: true
category: 'Developers Guide'
title: 'Security Token Scanner'
source_url: 'https://github.com/metabase/metabase/blob/master/docs/developers-guide/security-token-scanner.md'
layout: new-docs
---

# Security Token Scanner

The security token scanner is a tool that automatically detects potentially leaked API keys, secrets, and other sensitive tokens in the Metabase codebase. It runs as a git precommit hook via `lint-staged` to prevent accidental token leaks from being committed.

## What it scans for

The scanner looks for patterns that match common token formats:

- **Airgap Tokens**: JWE tokens starting with `airgap_` (400+ characters)
- **Hash/Dev Tokens**: 64-character hex strings or `mb_dev_` prefixed tokens
- **OpenAI API Keys**: Keys starting with `sk-` (43-51 characters total)
- **JWT Tokens**: Standard JWT format with header.payload.signature
- **JWE Tokens**: Encrypted JWT tokens (400+ characters)
- **GitHub Tokens**: Personal access tokens starting with `gh[pousr]_`
- **Slack Bot Tokens**: Bot tokens starting with `xoxb-`
- **AWS Access Keys**: Access key IDs starting with `AKIA`

## Running the scanner

The scanner runs automatically via `lint-staged` on staged files during git commits. You can also run it directly from mage.

### Basic usage

```bash
# Scan specific files
./bin/mage -token-scan file1.txt file2.txt

# Scan all files in the project
./bin/mage -token-scan -a

# Run with verbose output
./bin/mage -token-scan -v file1.txt file2.txt

# Scan without showing line details
./bin/mage -token-scan --no-lines file1.txt file2.txt
```

### Example output

```
Scanning 143 files
Using thread pool size: 16
/Users/dev/metabase/src/metabase/api/auth.clj
Line# 42 [OpenAI API Key]: const apiKey = "sk-1234567890abcdef1234567890abcdef123456789012";

Scan completed in: 89ms
Files scanned: 143
Files with matches: 1
Total matches: 1
```

## Whitelisting legitimate tokens

Sometimes you need to include token-like strings in source code for testing or examples. The scanner uses a whitelist file to avoid flagging known safe tokens.

The whitelist is located at `mage/resources/token_scanner_whitelist.txt` and contains strings that should not be flagged as secrets:

```
# Common test/example tokens that appear in documentation
sk-1234567890abcdef1234567890abcdef123456789012
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

# Hash values from tests and examples
430bb02a37bb2471176e54ca323d0940c4e0ee210c3ab04262cb6576fe4ded6d
sha256:9ff56186de4dd0b9bb2a37c977c3a4c9358647cde60a16f11f4c05bded1fe77a

# Slack bot tokens from examples
xoxb-781236542736-2364535789652-GkwFDQoHqzXDVsC6GzqYUypD
```

To whitelist a token, add the exact string to this file. Each line is treated as a substring that will be checked against the entire line containing the token using exact substring matching.

**Important**: The whitelist uses simple substring matching, not regex patterns. Add the exact token string that should be ignored.

## Adding new token patterns

To add a new token pattern, edit `mage/mage/token_scan.clj` and add an entry to the `token-patterns` map:

```clojure
(def ^:private token-patterns
{"Existing Pattern" #"existing-regex"
"Your New Token Type" #"13{2}7"})
```

### Pattern guidelines

- **Be specific**: Patterns should match the actual token format, not environment variable assignments
- **Include length constraints**: Use `{min,max}` quantifiers to avoid false positives
- **Add comments**: Explain the token format and expected length
- **Test thoroughly**: Run the scanner on the codebase to check for false positives
- Run it on everything with: `mage -token-scan -a`

Example of a good pattern:
```clojure
"Stripe API Key" #"sk_live_[A-Za-z0-9]{24}" ;; Stripe live keys: sk_live_ + 24 chars
```

## Modifying file filtering

The scanner excludes certain files to avoid false positives from generated content. To modify the filtering, edit the `exclude-path-str?` function in `mage/mage/token_scan.clj`:

```clojure
(defn- exclude-path-str?
"Check if a file should be excluded from scanning"
[path-str]
(or
;; Existing exclusions
(str/includes? path-str "/.git/")
(str/includes? path-str "/node_modules/")

;; Add new exclusions
(str/includes? path-str "/my-generated-dir/")
(str/ends-with? path-str ".generated.js")))
```

### Common exclusions

The scanner currently excludes:
- **Build directories**: `target/`, `node_modules/`, `.git/`
- **Generated files**: `*.bundle.js`, `*.min.js`, `*.map`
- **Binary files**: `*.jar`, `*.class`, `*.so`, `*.dll`
- **Media files**: `*.png`, `*.jpg`, `*.svg`
- **Test data**: `/stories-data/`, `/test-data/`, `/fixtures/`
- **Checksum files**: `SHA256.sum`, `*.sha256`, `*.md5`

## Git Hook Integration

The scanner runs automatically as a git precommit hook. If it finds tokens or unused ignore comments, the commit will be blocked with:

- **Token detected**: Review the file to ensure it's not a real secret


The scanner only scans files that are staged for commit, making it fast and focused on new changes.

## Troubleshooting

### False positives

If the scanner flags legitimate code:

1. **Add to whitelist** if it's a test token or example (edit `token_scanner_whitelist.txt`)
2. **Refine the pattern** if it's too broad (edit `token-patterns`)
3. **Exclude the file type** if it's generated content (edit `exclude-path-str?`)

### Performance issues

The scanner uses parallel processing and should complete in under 5 seconds for most commits. If it's slow:

1. Check if too many files are being scanned (`-v` flag shows file list)
2. Consider excluding large generated directories
3. Patterns with broad wildcards (like `.*`) can be slow

### Bypassing the hook

If you need to bypass the scanner for a specific commit (not recommended):

```bash
git commit --no-verify -m "commit message"
```

Use this sparingly and only when absolutely necessary.

### Getting help

For issues with the scanner:

1. Check the git hook output for detailed error messages
2. Run the scanner locally to debug: `./bin/mage -token-scan -v file1.txt file2.txt`
3. Ask in the #security or #dev channels for help with patterns or exclusions
2 changes: 1 addition & 1 deletion _docs/master/people-and-groups/authenticating-with-saml.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ To _require_ people to log in with SSO, disable password authentication from **A

## New account notification emails

When people log in to Metabase for the first time via SSO, Metabase will automatically create an account for them, which will trigger an email notification to Metabase administrators. If you don't want these notifications to be sent, go to **Admin settings > Authentication > User provisioning**, and toggle off **"Notify admins of new users provisioned from SSO"**
When people log in to Metabase for the first time via SSO, Metabase will automatically create an account for them, which will trigger an email notification to Metabase administrators. If you don't want these notifications to be sent, you can toggle them off at the bottom of the Authentication page.

## Example code using SAML

Expand Down
Loading