Skip to content

launchdarkly/segment-product-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Segment Event Transformation dbt Package

This dbt package unifies your Segment event data into a consistent schema, handling both standard Segment tables (tracks, pages, identifies) and custom event tables.

🎯 What This Package Does

  1. Discovers all your Segment event tables automatically
  2. Unifies them into a single consistent allevents schema
  3. Properly handles context properties and user properties
  4. Works with both BigQuery and Snowflake
  5. Supports custom event tables automatically

πŸ“‹ Prerequisites

  1. dbt installed (version 1.0.0 or higher)
  2. Access to your Segment data in either:
    • BigQuery
    • Snowflake
  3. The following Segment tables should exist in your warehouse:
    • tracks
    • pages (optional)
    • identifies (optional)
    • Any custom event tables

πŸ” Setting Up Warehouse Credentials

BigQuery Setup

  1. Authentication Methods

    BigQuery supports two authentication methods:

    a) OAuth (Recommended for Local Development)

    # Install Google Cloud SDK
    brew install google-cloud-sdk  # macOS
    # OR visit https://cloud.google.com/sdk/docs/install for other OS
    
    # Login to Google Cloud
    gcloud auth application-default login
    
    # Set your project
    gcloud config set project your-project-id

    b) Service Account (Recommended for Production)

    • Go to Google Cloud Console > IAM & Admin > Service Accounts
    • Create a new service account or select existing
    • Grant required roles: BigQuery Data Viewer, BigQuery User
    • Create and download JSON key
    • Save the JSON key file securely
    • Update your profile:
    # ~/.dbt/profiles.yml
    your_project:
      target: dev
      outputs:
        dev:
          type: bigquery
          method: service-account
          project: your-project-id
          dataset: your_segment_dataset
          location: US
          threads: 4
          keyfile: /path/to/your/keyfile.json  # Full path to your keyfile

Snowflake Setup

  1. Create a Snowflake User and Role

    -- Run in Snowflake as ACCOUNTADMIN
    -- Create role
    CREATE ROLE IF NOT EXISTS TRANSFORMER_ROLE;
    GRANT USAGE ON WAREHOUSE your_warehouse TO ROLE TRANSFORMER_ROLE;
    GRANT USAGE ON DATABASE your_database TO ROLE TRANSFORMER_ROLE;
    GRANT USAGE ON SCHEMA your_database.your_schema TO ROLE TRANSFORMER_ROLE;
    GRANT SELECT ON ALL TABLES IN SCHEMA your_database.your_schema TO ROLE TRANSFORMER_ROLE;
    
    -- Create user
    CREATE USER IF NOT EXISTS dbt_user
      PASSWORD = 'your-secure-password'
      DEFAULT_ROLE = TRANSFORMER_ROLE
      DEFAULT_WAREHOUSE = your_warehouse;
    
    -- Grant role to user
    GRANT ROLE TRANSFORMER_ROLE TO USER dbt_user;
  2. Configure Credentials

    Choose one of these methods:

    a) Direct in profiles.yml (Development Only)

    # ~/.dbt/profiles.yml
    your_project(name it `segment_product_analytics`):
      target: dev
      outputs:
        dev:
          type: snowflake
          account: your-account
          user: dbt_user
          password: your-secure-password
          role: TRANSFORMER_ROLE
          database: your_database
          warehouse: your_warehouse
          schema: your_schema
          threads: 4

    b) Using Environment Variables (Recommended)

    # ~/.dbt/profiles.yml
    your_project(name it `segment_product_analytics`):
      target: dev
      outputs:
        dev:
          type: snowflake
          account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
          user: "{{ env_var('SNOWFLAKE_USER') }}"
          password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
          role: "{{ env_var('SNOWFLAKE_ROLE') }}"
          database: "{{ env_var('SNOWFLAKE_DATABASE') }}"
          warehouse: "{{ env_var('SNOWFLAKE_WAREHOUSE') }}"
          schema: "{{ env_var('SNOWFLAKE_SCHEMA') }}"
          threads: 4

    Then set in your shell:

    export SNOWFLAKE_ACCOUNT=your-account
    export SNOWFLAKE_USER=dbt_user
    export SNOWFLAKE_PASSWORD=your-secure-password
    export SNOWFLAKE_ROLE=TRANSFORMER_ROLE
    export SNOWFLAKE_DATABASE=your_database
    export SNOWFLAKE_WAREHOUSE=your_warehouse
    export SNOWFLAKE_SCHEMA=your_schema

Verifying Your Setup

After setting up credentials, verify your connection:

# Test your connection
dbt debug

# If successful, you should see something like:
# Connection test: OK

Quick Start

  1. Setup Environment

    # Create and activate virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`
    
    # Install dbt
    pip install dbt-core
    # Install warehouse-specific adapter
    pip install dbt-bigquery  # for BigQuery
    # OR
    pip install dbt-snowflake  # for Snowflake
  2. Install the Package

    Add to your packages.yml:

    packages:
      - git: "https://github.com/your-org/segment-product-analytics.git"
        revision: main
  3. Configure Your dbt Profile

    In ~/.dbt/profiles.yml:

    For BigQuery:

    your_project:
      target: dev
      outputs:
        dev:
          type: bigquery
          method: oauth  # or service_account
          project: your-project-id
          dataset: your_segment_dataset
          location: US  # adjust as needed
          threads: 4

    For Snowflake:

    your_project:
      target: dev
      outputs:
        dev:
          type: snowflake
          account: your-account
          user: your-user
          password: your-password
          role: your-role
          database: your-database
          warehouse: your-warehouse
          schema: your-schema
          threads: 4
  4. Configure the Package

    In your dbt_project.yml:

    vars:
      segment_product_analytics:
        segment_schema: "your_segment_schema"  # Schema where your Segment tables live
  5. Run the Transformation

    # Install dbt dependencies
    dbt deps
    
    # Run the transformation
    dbt run --select segment_product_analytics.allevents

πŸ“Š Output Schema

The package creates a table called allevents with the following schema:

device_id STRING,          -- Anonymous ID from Segment
user_id STRING,           -- User ID if identified
event_name STRING,        -- Name of the event
event_id STRING,          -- Unique event identifier
server_ts TIMESTAMP,      -- Server-side timestamp
device_ts TIMESTAMP,      -- Client-side timestamp
properties JSON,          -- All context properties
user_properties JSON      -- User properties (from identifies)

πŸ” Handling Custom Events

The package automatically discovers and processes any custom event tables in your Segment schema. For example, if you have a custom event table called button_clicked, it will:

  1. Automatically discover the table
  2. Extract all CONTEXT_* fields into the properties JSON
  3. Set the event_name to either:
    • The EVENT column value if it exists
    • The table name if no EVENT column

πŸ› οΈ Customization

Custom Event Name Mapping

If you need to customize event names, you can override the default naming in your dbt_project.yml:

vars:
  segment_product_analytics:
    event_name_overrides:
      button_clicked: "User Button Click"
      page_loaded: "Page View"

Adding Tests

Create a file models/staging/segment_product_analytics.yml:

version: 2

models:
  - name: allevents
    columns:
      - name: event_id
        tests:
          - unique
          - not_null
      - name: server_ts
        tests:
          - not_null

πŸ” Troubleshooting

  1. Missing Tables

    • Ensure your Segment schema is correctly specified
    • Check table permissions
    • Verify table names match expected casing
  2. JSON Errors

    • BigQuery: Ensure proper JSON formatting
    • Snowflake: Check OBJECT_CONSTRUCT usage
  3. Performance Issues

    • Consider partitioning by server_ts
    • Add appropriate clustering keys
    • Optimize JSON queries

πŸ“ Development

To contribute:

  1. Clone the repository
  2. Install dependencies: dbt deps
  3. Make changes
  4. Add tests
  5. Submit a PR

🀝 Need Help?

  • Open an issue
  • Check existing documentation
  • Contact maintainers

πŸ“œ License

MIT

About

dbt package to transform your raw Segment event data into Product Analytics event Schema

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages