This dbt package unifies your Segment event data into a consistent schema, handling both standard Segment tables (tracks, pages, identifies) and custom event tables.
- Discovers all your Segment event tables automatically
- Unifies them into a single consistent
allevents
schema - Properly handles context properties and user properties
- Works with both BigQuery and Snowflake
- Supports custom event tables automatically
- dbt installed (version 1.0.0 or higher)
- Access to your Segment data in either:
- BigQuery
- Snowflake
- The following Segment tables should exist in your warehouse:
- tracks
- pages (optional)
- identifies (optional)
- Any custom event tables
-
Authentication Methods
BigQuery supports two authentication methods:
a) OAuth (Recommended for Local Development)
# Install Google Cloud SDK brew install google-cloud-sdk # macOS # OR visit https://cloud.google.com/sdk/docs/install for other OS # Login to Google Cloud gcloud auth application-default login # Set your project gcloud config set project your-project-id
b) Service Account (Recommended for Production)
- Go to Google Cloud Console > IAM & Admin > Service Accounts
- Create a new service account or select existing
- Grant required roles:
BigQuery Data Viewer
,BigQuery User
- Create and download JSON key
- Save the JSON key file securely
- Update your profile:
# ~/.dbt/profiles.yml your_project: target: dev outputs: dev: type: bigquery method: service-account project: your-project-id dataset: your_segment_dataset location: US threads: 4 keyfile: /path/to/your/keyfile.json # Full path to your keyfile
-
Create a Snowflake User and Role
-- Run in Snowflake as ACCOUNTADMIN -- Create role CREATE ROLE IF NOT EXISTS TRANSFORMER_ROLE; GRANT USAGE ON WAREHOUSE your_warehouse TO ROLE TRANSFORMER_ROLE; GRANT USAGE ON DATABASE your_database TO ROLE TRANSFORMER_ROLE; GRANT USAGE ON SCHEMA your_database.your_schema TO ROLE TRANSFORMER_ROLE; GRANT SELECT ON ALL TABLES IN SCHEMA your_database.your_schema TO ROLE TRANSFORMER_ROLE; -- Create user CREATE USER IF NOT EXISTS dbt_user PASSWORD = 'your-secure-password' DEFAULT_ROLE = TRANSFORMER_ROLE DEFAULT_WAREHOUSE = your_warehouse; -- Grant role to user GRANT ROLE TRANSFORMER_ROLE TO USER dbt_user;
-
Configure Credentials
Choose one of these methods:
a) Direct in profiles.yml (Development Only)
# ~/.dbt/profiles.yml your_project(name it `segment_product_analytics`): target: dev outputs: dev: type: snowflake account: your-account user: dbt_user password: your-secure-password role: TRANSFORMER_ROLE database: your_database warehouse: your_warehouse schema: your_schema threads: 4
b) Using Environment Variables (Recommended)
# ~/.dbt/profiles.yml your_project(name it `segment_product_analytics`): target: dev outputs: dev: type: snowflake account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user: "{{ env_var('SNOWFLAKE_USER') }}" password: "{{ env_var('SNOWFLAKE_PASSWORD') }}" role: "{{ env_var('SNOWFLAKE_ROLE') }}" database: "{{ env_var('SNOWFLAKE_DATABASE') }}" warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}" schema: "{{ env_var('SNOWFLAKE_SCHEMA') }}" threads: 4
Then set in your shell:
export SNOWFLAKE_ACCOUNT=your-account export SNOWFLAKE_USER=dbt_user export SNOWFLAKE_PASSWORD=your-secure-password export SNOWFLAKE_ROLE=TRANSFORMER_ROLE export SNOWFLAKE_DATABASE=your_database export SNOWFLAKE_WAREHOUSE=your_warehouse export SNOWFLAKE_SCHEMA=your_schema
After setting up credentials, verify your connection:
# Test your connection
dbt debug
# If successful, you should see something like:
# Connection test: OK
-
Setup Environment
# Create and activate virtual environment python -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate` # Install dbt pip install dbt-core # Install warehouse-specific adapter pip install dbt-bigquery # for BigQuery # OR pip install dbt-snowflake # for Snowflake
-
Install the Package
Add to your
packages.yml
:packages: - git: "https://github.com/your-org/segment-product-analytics.git" revision: main
-
Configure Your dbt Profile
In
~/.dbt/profiles.yml
:For BigQuery:
your_project: target: dev outputs: dev: type: bigquery method: oauth # or service_account project: your-project-id dataset: your_segment_dataset location: US # adjust as needed threads: 4
For Snowflake:
your_project: target: dev outputs: dev: type: snowflake account: your-account user: your-user password: your-password role: your-role database: your-database warehouse: your-warehouse schema: your-schema threads: 4
-
Configure the Package
In your
dbt_project.yml
:vars: segment_product_analytics: segment_schema: "your_segment_schema" # Schema where your Segment tables live
-
Run the Transformation
# Install dbt dependencies dbt deps # Run the transformation dbt run --select segment_product_analytics.allevents
The package creates a table called allevents
with the following schema:
device_id STRING, -- Anonymous ID from Segment
user_id STRING, -- User ID if identified
event_name STRING, -- Name of the event
event_id STRING, -- Unique event identifier
server_ts TIMESTAMP, -- Server-side timestamp
device_ts TIMESTAMP, -- Client-side timestamp
properties JSON, -- All context properties
user_properties JSON -- User properties (from identifies)
The package automatically discovers and processes any custom event tables in your Segment schema. For example, if you have a custom event table called button_clicked
, it will:
- Automatically discover the table
- Extract all CONTEXT_* fields into the properties JSON
- Set the event_name to either:
- The EVENT column value if it exists
- The table name if no EVENT column
If you need to customize event names, you can override the default naming in your dbt_project.yml
:
vars:
segment_product_analytics:
event_name_overrides:
button_clicked: "User Button Click"
page_loaded: "Page View"
Create a file models/staging/segment_product_analytics.yml
:
version: 2
models:
- name: allevents
columns:
- name: event_id
tests:
- unique
- not_null
- name: server_ts
tests:
- not_null
-
Missing Tables
- Ensure your Segment schema is correctly specified
- Check table permissions
- Verify table names match expected casing
-
JSON Errors
- BigQuery: Ensure proper JSON formatting
- Snowflake: Check OBJECT_CONSTRUCT usage
-
Performance Issues
- Consider partitioning by server_ts
- Add appropriate clustering keys
- Optimize JSON queries
To contribute:
- Clone the repository
- Install dependencies:
dbt deps
- Make changes
- Add tests
- Submit a PR
- Open an issue
- Check existing documentation
- Contact maintainers
MIT