Production-ready Claude Code plugins for Customer Data Platform (CDP) implementation automation
The APS CDP Tools Marketplace is a collection of Claude Code plugins designed to automate the entire CDP implementation lifecycle in Treasure Data. These plugins enforce production-tested patterns, strict quality gates, and comprehensive validation to ensure reliable, maintainable data pipelines.
Accelerate CDP implementation from weeks to days by providing AI-powered, template-driven automation for:
- Data ingestion from multiple sources
- Data transformation and quality improvement
- Historical and incremental data consolidation
- Customer identity unification
The marketplace follows a modular architecture where each plugin handles a specific phase of the CDP pipeline:
┌─────────────────────────────────────────────────────────────────────────┐
│ APS CDP Tools Marketplace │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────┐
│ CDP ORCHESTRATOR │
│ │
│ End-to-End Automation │
│ • Generate → Deploy → │
│ • Execute → Monitor → │
│ • Validate │
└──────────────────────────┘
│
┌───────────────────────────┼───────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌────────────────┐
│ cdp-ingestion │ │ cdp-histunion │ │ cdp-staging │
│ │ │ │ │ │
│ Data Sources │ ───────▶│ Consolidation │ ───────▶│ Transformation │
│ • BigQuery │ │ • Hist + Inc │ │ • Cleansing │
│ • Klaviyo │ │ • Watermarks │ │ • PII Handling │
│ • Shopify │ │ • Schema Mgmt │ │ • Validation │
│ • OneTrust │ │ │ │ • JSON Extract │
│ • SFTP │ │ │ │ │
└───────────────┘ └────────────────┘ └────────────────┘
│ │ │
└───────────────────────────┼───────────────────────────┘
▼
┌──────────────────────────────────────┐
│ Identity Unification │
└──────────────────────────────────────┘
│ │
┌────────────┴────────┬────────┴──────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ cdp-unification │ │ cdp-hybrid-idu │ │ cdp-hybrid-idu │
│ │ │ │ │ │
│ Treasure Data │ │ Snowflake │ │ Databricks │
│ • ID Resolution │ │ • Auto YAML Gen │ │ • Auto YAML Gen │
│ • Entity Merge │ │ • IDU SQL Gen │ │ • IDU SQL Gen │
│ • Master Records │ │ • IDU Execution │ │ • IDU Execution │
│ • TD Workflow │ │ • Merge Reports │ │ • Merge Reports │
└──────────────────┘ └──────────────────┘ └──────────────────┘
aps_claude_tools/
├── .claude-plugin/
│ └── marketplace.json # Marketplace registry
├── plugins/
│ ├── cdp-ingestion/ # Phase 1: Data Ingestion
│ │ ├── plugin.json
│ │ ├── prompt.md
│ │ ├── agents/
│ │ │ └── cdp-ingestion-expert.md
│ │ ├── commands/
│ │ │ ├── ingest-new.md
│ │ │ ├── ingest-add-klaviyo.md
│ │ │ ├── ingest-add-object.md
│ │ │ └── ingest-validate-wf.md
│ │ └── docs/
│ │ ├── patterns/
│ │ └── sources/
│ │
│ ├── cdp-histunion/ # Phase 2: Data Consolidation
│ │ ├── plugin.json
│ │ ├── prompt.md
│ │ ├── agents/
│ │ │ └── cdp-histunion-expert.md
│ │ ├── commands/
│ │ │ ├── histunion-create.md
│ │ │ ├── histunion-batch.md
│ │ │ └── histunion-validate.md
│ │ └── docs/
│ │ └── examples.md
│ │
│ ├── cdp-staging/ # Phase 3: Data Transformation
│ │ ├── plugin.json
│ │ ├── prompt.md
│ │ ├── agents/
│ │ │ ├── staging-transformer-presto.md
│ │ │ └── staging-transformer-hive.md
│ │ ├── commands/
│ │ │ ├── transform-table.md
│ │ │ ├── transform-batch.md
│ │ │ └── transform-validation.md
│ │ └── docs/
│ │
│ ├── cdp-unification/ # Phase 4a: Identity Resolution (TD)
│ │ ├── plugin.json
│ │ ├── prompt.md
│ │ ├── agents/
│ │ │ └── cdp-unification-expert.md
│ │ ├── commands/
│ │ │ ├── unify-setup.md
│ │ │ ├── unify-extract-keys.md
│ │ │ ├── unify-create-prep.md
│ │ │ └── unify-create-config.md
│ │ └── docs/
│ │
│ ├── cdp-hybrid-idu/ # Phase 4b: Hybrid ID Unification
│ │ ├── plugin.json
│ │ ├── prompt.md
│ │ ├── agents/
│ │ │ ├── yaml-configuration-builder.md
│ │ │ ├── hybrid-unif-keys-extractor.md
│ │ │ ├── snowflake-sql-generator.md
│ │ │ ├── snowflake-workflow-executor.md
│ │ │ ├── databricks-sql-generator.md
│ │ │ ├── databricks-workflow-executor.md
│ │ │ └── merge-stats-report-generator.md
│ │ ├── commands/
│ │ │ ├── hybrid-setup.md
│ │ │ ├── hybrid-unif-config-creator.md
│ │ │ ├── hybrid-unif-config-validate.md
│ │ │ ├── hybrid-generate-snowflake.md
│ │ │ ├── hybrid-execute-snowflake.md
│ │ │ ├── hybrid-generate-databricks.md
│ │ │ ├── hybrid-execute-databricks.md
│ │ │ └── hybrid-unif-merge-stats-creator.md
│ │ ├── scripts/
│ │ │ ├── snowflake/
│ │ │ │ ├── yaml_unification_to_snowflake.py
│ │ │ │ └── snowflake_sql_executor.py
│ │ │ └── databricks/
│ │ │ ├── yaml_unification_to_databricks.py
│ │ │ └── databricks_sql_executor.py
│ │ └── docs/
│ │
│ └── cdp-orchestrator/ # End-to-End Pipeline Automation
│ ├── plugin.json
│ ├── prompt.md
│ ├── README.md # Comprehensive documentation
│ ├── PLAN.md # Architecture planning docs
│ ├── agents/
│ │ └── cdp-pipeline-orchestrator.md
│ └── commands/
│ └── cdp-implement.md
│
└── README.md # This file
Purpose: Automate data ingestion from various sources into Treasure Data raw layer.
Key Features:
- Template-driven workflow generation
- Support for 6+ data sources (BigQuery, Klaviyo, Shopify, OneTrust, Pinterest, SFTP)
- Incremental and historical ingestion modes
- Built-in error handling and logging
- Credential management via TD secrets
Slash Commands:
/cdp-ingestion:ingest-new- Create new source ingestion workflow/cdp-ingestion:ingest-add-klaviyo- Add Klaviyo source with all objects/cdp-ingestion:ingest-add-object- Add object to existing source/cdp-ingestion:ingest-validate-wf- Validate workflow files
Output: Digdag workflows (.dig), datasource configs (.yml), load configs (.yml)
Purpose: Combine historical and incremental tables into unified tables with watermark-based incremental loading.
Key Features:
- Intelligent schema validation via MCP tool
- Automatic detection of schema differences (e.g.,
incremental_datecolumn) - Support for full-load tables (klaviyo_lists, klaviyo_metric_data)
- Watermark management using inc_log table
- Parallel task execution
- NULL handling for schema mismatches
Slash Commands:
/cdp-histunion:histunion-create- Create hist-union for single table/cdp-histunion:histunion-batch- Batch create for multiple tables/cdp-histunion:histunion-validate- Validate workflows and SQL
Output: SQL files with UNION ALL logic, Digdag workflows with parallel execution
Purpose: Transform histunion data into staging layer with data quality, standardization, and PII handling.
Key Features:
- Schema-driven transformation (uses MCP to get exact schemas)
- Comprehensive data cleansing and standardization
- PII masking and handling
- JSON extraction and flattening
- Deduplication strategies
- Support for both Presto and Hive SQL engines
Slash Commands:
/cdp-staging:transform-table- Transform single table to staging format/cdp-staging:transform-batch- Batch transform multiple tables/cdp-staging:transform-validation- Validate staging SQL against quality gates
Output: Presto/Hive SQL transformation files, Digdag workflows
Purpose: Implement customer identity resolution and unification to create golden records in Treasure Data.
Key Features:
- Live table analysis via MCP (Treasure Data API)
- Automatic extraction of identity keys (email, phone, user_id, etc.)
- Prep table generation for identity matching
- ID graph configuration
- Master record creation
Slash Commands:
/cdp-unification:unify-setup- Complete end-to-end ID unification setup/cdp-unification:unify-extract-keys- Extract identity columns from tables/cdp-unification:unify-create-prep- Generate prep table files/cdp-unification:unify-create-config- Generate unification config
Output: Prep table SQL files, unify.yml config, id_unification.dig workflow
Purpose: Cross-platform identity unification for Snowflake and Databricks using YAML-driven configuration and intelligent convergence detection.
Key Features:
- Platform Support: Snowflake and Databricks Delta Lake
- YAML Configuration: Single
unify.ymldrives SQL generation for both platforms - Intelligent Convergence: Automatic loop detection stops when ID graph stabilizes
- Native SQL: Platform-specific optimizations (Snowflake VARIANT, Databricks Delta)
- Real-time Execution: Monitor workflow progress with convergence metrics
- Key Validation: Regex patterns and invalid text filtering
- Master Tables: Priority-based attribute selection with array support
- Metadata Tracking: Complete lineage and column mapping
Platform-Specific Features:
Snowflake:
ARRAY_CONSTRUCT(),LATERAL FLATTEN(),ARRAY_AGG()CLUSTER BYfor performance optimizationVARIANTsupport for flexible data structures- Native Snowflake connector authentication (password, SSO, key-pair)
Databricks:
- Delta Lake table format with ACID transactions
COLLECT_LIST(),EXPLODE(), array operations- Spark SQL optimizations
- Unity Catalog integration
Slash Commands:
/cdp-hybrid-idu:hybrid-setup- End-to-end setup with automated YAML creation, SQL generation, and execution/cdp-hybrid-idu:hybrid-unif-config-creator- Auto-generate unify.yml from live table analysis (Snowflake/Databricks)/cdp-hybrid-idu:hybrid-unif-config-validate- Validate YAML configuration/cdp-hybrid-idu:hybrid-generate-snowflake- Generate Snowflake SQL from YAML/cdp-hybrid-idu:hybrid-execute-snowflake- Execute Snowflake workflow with convergence detection/cdp-hybrid-idu:hybrid-generate-databricks- Generate Databricks SQL from YAML/cdp-hybrid-idu:hybrid-execute-databricks- Execute Databricks workflow with convergence detection/cdp-hybrid-idu:hybrid-unif-merge-stats-creator- Generate professional HTML merge statistics report
Input: unify.yml with keys, tables, canonical_ids, master_tables
Output:
- SQL Files: 20+ files (graph creation, loop iterations, canonicalization, enrichment, master tables, metadata)
- Execution Reports: Convergence metrics, row counts, timing
- Tables Created: ID graphs, lookup tables, enriched tables, master tables
Convergence Algorithm:
Iteration 1: 14,573 records → 1,565 merged → Continue
Iteration 2: 13,035 records → 15 merged → Continue
Iteration 3: 13,034 records → 1 merged → Continue
Iteration 4: 13,034 records → 0 merged → CONVERGED (Stop)
Example Workflow:
# Option 1: Automated end-to-end setup
/cdp-hybrid-idu:hybrid-setup
# Input: Platform (Snowflake/Databricks), tables to analyze, canonical ID name
# Output: Automated YAML creation → SQL generation → Workflow execution
# Result: Complete ID unification with merge statistics
# Option 2: Step-by-step with manual YAML
# 1. Auto-generate YAML configuration
/cdp-hybrid-idu:hybrid-unif-config-creator
# Input: Platform, tables list, canonical ID name
# Output: unify.yml with validated keys and tables
# 2. Generate Snowflake SQL from YAML
/cdp-hybrid-idu:hybrid-generate-snowflake
# Input: unify.yml, database: INDRESH_TEST, schema: PUBLIC
# Output: 22 SQL files in snowflake_sql/unify/
# 3. Execute with convergence detection
/cdp-hybrid-idu:hybrid-execute-snowflake
# Result: 4,940 canonical IDs in 4 iterations (19,512 identities merged)
# 4. Generate merge statistics report
/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
# Input: Platform, database, schema, canonical ID
# Output: Beautiful HTML report with expert analysis (id_unification_report.html)
# 5. Verify results on Snowflake
SELECT * FROM INDRESH_TEST.PUBLIC.td_id_lookup LIMIT 10;
SELECT * FROM INDRESH_TEST.PUBLIC.td_id_master_table LIMIT 10;New Features (v1.6.0):
- Automated YAML Configuration: The
hybrid-unif-config-creatorcommand uses MCP tools to analyze actual Snowflake/Databricks tables, extract user identifiers with strict PII detection, and generate production-readyunify.ymlconfiguration automatically - Merge Statistics Reporting: The
hybrid-unif-merge-stats-creatorcommand generates comprehensive HTML reports with:- Executive summary with key metrics (merge ratio, fragmentation reduction)
- Identity resolution performance analysis
- Merge distribution patterns and complexity scoring
- Data quality metrics with coverage percentages
- Expert recommendations for optimization
- PDF-ready professional design
- Improved hybrid-setup: Now includes automated table analysis and YAML generation as first step
Purpose: End-to-end automation of the complete CDP implementation pipeline with automated workflow generation, deployment to Treasure Data, execution, real-time monitoring, and data validation across all 4 phases.
Key Features:
- Fully Automated Execution: Generates workflows → Deploys to TD → Executes → Monitors → Validates
- Sequential Phase Execution: Enforces proper phase order with data dependency validation
- Real-Time Monitoring: Polls workflow status every 30 seconds, shows elapsed time and progress
- Intelligent Error Handling: Auto-fixes common deployment errors (syntax, missing databases), retries up to 3 times
- Data Validation: Verifies tables created, row counts > 0, schema expectations met before proceeding
- Progress Tracking: Uses TodoWrite to show real-time status updates for transparency
- TD Toolbelt Integration: Direct integration with
td wf push,td wf start,td wf sessioncommands - State Management: Maintains
pipeline_state.jsonfor resume capability if interrupted - Comprehensive Reporting: Final report includes execution summary, session IDs, data quality metrics
Phase Orchestration Pattern:
For Each Phase (Ingestion → Hist-Union → Staging → Unification):
[1] GENERATE → Invoke plugin slash command (/cdp-{plugin}:command)
[2] DEPLOY → Execute: td wf push {project}
[3] EXECUTE → Execute: td wf start {project} {workflow}
[4] MONITOR → Poll: td wf session {session_id} (every 30s)
[5] VALIDATE → Query TD: Verify tables created with data
[6] PROCEED → Only if validation passes
CRITICAL: Each phase must complete successfully before next phase starts
Slash Commands:
/cdp-orchestrator:cdp-implement- Complete end-to-end CDP implementation pipeline
Prerequisites:
- TD Toolbelt installed (
td --version) - TD API credentials (API key + endpoint)
- Source system authentication (TD Auth ID)
- Write permissions to TD (create databases, tables, workflows)
Input Requirements:
Global Configuration:
- TD_API_KEY - Treasure Data API key from console
- TD_ENDPOINT - Regional endpoint (US/EU/Tokyo/Asia Pacific)
- Client name - Identifier for database naming
Phase 1 (Ingestion):
- Source name, connector type, objects/tables
- Ingestion mode (incremental/historical/both)
- Incremental field, start date
- Authentication ID
Phase 2 (Hist-Union):
- Tables from Phase 1 output (auto-detected or user-provided)
Phase 3 (Staging):
- SQL engine (Presto/Hive)
- Tables from Phase 2 (auto-detected)
Phase 4 (Unification):
- Unification name, ID method (persistent_id/canonical_id)
- Update strategy (incremental/full)
- Tables from Phase 3 (auto-detected)
Output:
- 50-70 generated files across all phases (.dig, .yml, .sql)
- 4 deployed TD projects (ingestion, hist_union, staging, unification)
- Session IDs for all workflow executions
- Pipeline state file (pipeline_state.json)
- Comprehensive final report (pipeline_report.md)
- Execution logs (pipeline_logs/{date}/)
Timeline: 3-4 hours total (depending on data volume)
- Phase 1 (Ingestion): ~1 hour
- Phase 2 (Hist-Union): ~30 minutes
- Phase 3 (Staging): ~45 minutes
- Phase 4 (Unification): ~1.5 hours
Error Handling:
- Deployment Errors: Auto-fixes syntax errors, prompts for missing databases/secrets
- Execution Errors: Retrieves logs, shows error details, offers retry/fix/skip/abort options
- Validation Errors: Shows missing tables/data, asks user decision before proceeding
- Timeout Handling: Alerts after 2 hours, offers continue/check/abort options
Use Cases:
- First-time CDP implementation (complete setup from scratch)
- Multi-source ingestion with automated processing
- Automated testing of CDP pipelines
- Standardized deployment across environments (dev/staging/prod)
Example Workflow:
# Single command for complete CDP implementation
/cdp-orchestrator:cdp-implement
# Orchestrator will prompt for:
# 1. Global config (API key, endpoint, client name)
# 2. Phase 1 config (Snowflake: tables, auth, dates)
# 3. Phase 2 config (tables to combine)
# 4. Phase 3 config (SQL engine: presto)
# 5. Phase 4 config (unification name, ID method)
# Then automatically:
# - Generates all 50+ workflow files
# - Deploys all 4 projects to TD
# - Executes workflows in sequence
# - Monitors real-time progress
# - Validates data between phases
# - Generates final report
# Result: Complete CDP implementation ready for productionIntegration with Other Plugins: The orchestrator internally invokes:
/cdp-ingestion:ingest-newfor Phase 1/cdp-histunion:histunion-batchfor Phase 2/cdp-staging:transform-batchfor Phase 3/cdp-unification:unify-setupfor Phase 4
Users can still run individual plugins manually for granular control.
Documentation: See plugins/cdp-orchestrator/README.md for complete prerequisites, configuration guide, troubleshooting, and examples.
┌─────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: INGESTION │
├─────────────────────────────────────────────────────────────────────────┤
│ Source Systems → Raw Layer (TD) │
│ • Klaviyo → client_src.klaviyo_events │
│ • Shopify → client_src.shopify_products │
│ • BigQuery → client_src.analytics_users │
│ • OneTrust → client_src.onetrust_profiles │
│ │
│ Tool: /cdp-ingestion:ingest-new │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: HISTORICAL CONSOLIDATION │
├─────────────────────────────────────────────────────────────────────────┤
│ Hist + Inc Tables → Unified Tables (TD) │
│ klaviyo_events_hist → client_src.klaviyo_events_histunion │
│ klaviyo_events → │
│ │
│ Features: │
│ • Watermark-based incremental loading │
│ • Schema validation and NULL handling │
│ • Parallel processing │
│ │
│ Tool: /cdp-histunion:histunion-batch │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: STAGING TRANSFORMATION │
├─────────────────────────────────────────────────────────────────────────┤
│ Histunion Tables → Staging Layer (TD) │
│ client_src.*_histunion → client_stg.klaviyo_events │
│ → client_stg.shopify_products │
│ │
│ Transformations: │
│ • Data cleansing (trim, case normalization) │
│ • PII masking (email, phone) │
│ • JSON extraction │
│ • Deduplication │
│ • Data type standardization │
│ │
│ Tool: /cdp-staging:transform-batch │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ PHASE 4: IDENTITY UNIFICATION │
├─────────────────────────────────────────────────────────────────────────┤
│ Staging Tables → Golden Records (TD) │
│ All * → client_master.unified_customers │
│ │
│ Process: │
│ 1. Extract identity keys (email, phone, user_id) │
│ 2. Create prep tables for matching │
│ 3. Build identity graph │
│ 4. Generate master customer records │
│ │
│ Tool: /cdp-unification:unify-setup │
└─────────────────────────────────────────────────────────────────────────┘
-
Clone the repository:
git clone <repository-url> cd aps_claude_tools
-
Install plugins (in Claude Code):
/plugin install cdp-ingestion /plugin install cdp-staging /plugin install cdp-histunion /plugin install cdp-unification /plugin install cdp-hybrid-idu /plugin install cdp-orchestrator
-
Restart Claude Code to load plugins
-
Verify installation:
# Check all plugins loaded /plugin list # Should show: # - cdp-ingestion # - cdp-histunion # - cdp-staging # - cdp-unification # - cdp-hybrid-idu # - cdp-orchestrator
Use the CDP Orchestrator for fully automated pipeline execution:
# Single command for complete CDP implementation
/cdp-orchestrator:cdp-implement
# The orchestrator will:
# 1. Collect all configuration upfront
# 2. Show complete execution plan
# 3. Execute all 4 phases automatically
# 4. Monitor progress in real-time
# 5. Validate data between phases
# 6. Generate comprehensive final report
# Timeline: 3-4 hours (automatic)When to use:
- First-time CDP implementation
- Clean setup from scratch
- Want automated execution with monitoring
- Need deployment to Treasure Data
- Require data validation between phases
Prerequisites:
- TD Toolbelt installed (
td --version) - TD API credentials ready
- Source system credentials configured
- Write permissions to TD
See plugins/cdp-orchestrator/README.md for complete guide.
Use individual plugin commands for granular control:
# Create ingestion workflow for Klaviyo
/cdp-ingestion:ingest-add-klaviyo
# Or for custom source
/cdp-ingestion:ingest-newInput: Source details, credentials, objects to ingest
Output: ingestion/{source}_ingest_inc.dig, config files
Result: Raw data flowing into client_src.* tables
# Create hist-union for multiple tables
/cdp-histunion:histunion-batch
# Provide tables:
# client_src.klaviyo_events, client_src.shopify_productsInput: List of tables with hist/inc variants
Output: hist_union/queries/{table}.sql, hist_union_runner.dig
Result: Unified tables in client_src.*_histunion
# Transform multiple tables in batch
/cdp-staging:transform-batch
# Provide list of tables:
# client_src.klaviyo_events_histunion, client_src.shopify_products_histunionInput: List of histunion tables
Output: staging/queries/{table}.sql, workflow files
Result: Clean, standardized data in client_stg.* tables
Option A: Treasure Data (native)
# Run complete ID unification setup
/cdp-unification:unify-setup
# Or step by step:
/cdp-unification:unify-extract-keys # Extract identity columns
/cdp-unification:unify-create-prep # Create prep tables
/cdp-unification:unify-create-config # Generate unify.ymlInput: Database, tables with customer data
Output: Prep tables, unify.yml, id_unification.dig
Result: Unified customer records in client_master.unified_customers
Option B: Snowflake (hybrid)
# Generate Snowflake SQL from YAML config
/cdp-hybrid-idu:hybrid-generate-snowflake
# Execute with convergence detection
/cdp-hybrid-idu:hybrid-execute-snowflakeInput: unify.yml, Snowflake connection details
Output: 20+ SQL files, execution report with convergence metrics
Result: Canonical IDs in td_id_lookup, enriched tables, master table
Option C: Databricks (hybrid)
# Complete automated setup (recommended)
/cdp-hybrid-idu:hybrid-setup
# Or step-by-step:
# 1. Auto-generate YAML config
/cdp-hybrid-idu:hybrid-unif-config-creator
# 2. Generate Databricks SQL from YAML
/cdp-hybrid-idu:hybrid-generate-databricks
# 3. Execute with convergence detection
/cdp-hybrid-idu:hybrid-execute-databricks
# 4. Generate merge statistics report
/cdp-hybrid-idu:hybrid-unif-merge-stats-creatorInput: unify.yml, Databricks connection details
Output: 20+ SQL files, execution report with convergence metrics, HTML statistics report
Result: Canonical IDs in Delta Lake tables with master records and comprehensive analytics
The cdp-hybrid-idu plugin includes a powerful reporting feature that generates professional HTML reports analyzing ID unification results:
Command: /cdp-hybrid-idu:hybrid-unif-merge-stats-creator
Platform Support: Snowflake and Databricks
Report Sections:
- Executive Summary - Key metrics (unified profiles, merge ratio, fragmentation reduction)
- Identity Resolution Performance - Deduplication rates by key type
- Merge Distribution Analysis - Pattern breakdown and complexity scoring
- Top Merged Profiles - Highest complexity identity resolutions
- Source Table Configuration - Column mappings and data sources
- Master Table Data Quality - Coverage percentages for all attributes
- Convergence Performance - Iteration analysis and efficiency metrics
- Expert Recommendations - Strategic guidance and optimization tips
- Summary Statistics - Complete metrics reference
Features:
- Error-Proof Design: 10 layers of validation ensure zero errors
- Consistent Output: Same beautiful report every time
- Platform-Agnostic: Works identically for Snowflake and Databricks
- PDF-Ready: Print to PDF for stakeholder distribution
- Expert Analysis: Data-driven insights and actionable recommendations
Example:
/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
> Platform: Snowflake
> Database: INDRESH_TEST
> Schema: PUBLIC
> Canonical ID: td_id
> Output: (press Enter for default)
✓ Report generated: id_unification_report.html (142 KB)Sample Metrics from Generated Report:
- Unified Profiles: 4,940
- Total Identities: 19,512
- Merge Ratio: 3.95:1
- Fragmentation Reduction: 74.7%
- Email Coverage: 100%
- Phone Coverage: 99.39%
- Convergence: 4 iterations
- Data Quality Score: 99.7%
# 1. Ingest Shopify data
/cdp-ingestion:ingest-new
# Source: Shopify, Objects: products, customers, orders
# 2. Ingest Klaviyo marketing data
/cdp-ingestion:ingest-add-klaviyo
# All Klaviyo objects: events, profiles, campaigns, lists
# 3. Consolidate historical and incremental
/cdp-histunion:histunion-batch
# Tables: shopify_products, shopify_customers, shopify_orders,
# klaviyo_events, klaviyo_profiles
# 4. Transform histunion tables to staging
/cdp-staging:transform-batch
# Tables: shopify_products_histunion, shopify_customers_histunion,
# shopify_orders_histunion, klaviyo_events_histunion,
# klaviyo_profiles_histunion
# 5. Unify customer identities
/cdp-unification:unify-setup
# Database: client_stg
# Tables: shopify_customers, klaviyo_profiles
# Result: Golden customer records ready for analytics and activationAll plugins use exact, production-tested templates. No improvisation or "improvements" allowed. This ensures:
- Consistency across implementations
- Reduced debugging time
- Proven patterns that work first time
Plugins use MCP (Model Context Protocol) to access Treasure Data API and get exact schemas:
- No manual column listing
- No guessing data types
- Automatic detection of schema differences
All files for a task are generated in a SINGLE response:
- User gets complete working solution immediately
- No version mismatches between files
- Ready to deploy and test
Every plugin enforces strict quality checks:
- Syntax validation (YAML, SQL)
- Schema compliance
- Template adherence
- Error handling completeness
- Logging presence
Code generated by these plugins:
- Works the first time
- Follows TD best practices
- Includes comprehensive error handling
- Has complete logging
- Is maintainable and documented
- Platforms:
- Treasure Data (TD)
- Snowflake
- Databricks
- Workflow Engines:
- Digdag (TD)
- Snowflake Tasks
- Databricks Jobs
- Query Engines:
- Presto (TD)
- Hive (TD)
- Snowflake SQL
- Spark SQL (Databricks)
- Storage Formats:
- TD Native (Presto/Hive)
- Snowflake Tables with VARIANT
- Delta Lake (Databricks)
- AI Framework: Claude Code with MCP
- Version Control: Git
- Configuration: YAML, JSON
- Authentication:
- TD API Keys
- Snowflake (Password, SSO, Key-Pair)
- Databricks (Token, OAuth)
-
Create plugin directory:
mkdir -p plugins/my-plugin/{agents,commands,docs} -
Create plugin.json:
{ "name": "my-plugin", "description": "Plugin description", "version": "1.0.0", "author": { "name": "@cdp-tools-marketplace", "organization": "APS CDP Team" }, "prompt": "prompt.md", "agents": ["agents/my-expert.md"], "commands": ["commands/my-command.md"] } -
Create prompt.md with plugin instructions
-
Create agent in
agents/my-expert.md -
Create commands in
commands/*.md -
Register in marketplace:
// .claude-plugin/marketplace.json { "plugins": [ { "name": "my-plugin", "source": "./plugins/my-plugin", "description": "Brief description" } ] }
-
Always validate before deploying:
/cdp-{plugin}:validate -
Review generated files before running workflows
-
Test in dev environment first
-
Use batch commands for multiple tables to save time
-
Follow the pipeline order: ingestion → histunion → staging → unification
-
Read existing plugins to understand patterns
-
Use exact templates - never improvise
-
Enforce quality gates in validation commands
-
Document examples in docs/ directory
-
Use MCP for live data access (table schemas, etc.)
-
Generate all files in ONE response (batch generation)
- Check plugin-specific
docs/directories for examples - Review
prompt.mdfor detailed instructions - Look at existing generated files for patterns
- Fork the repository
- Create a feature branch
- Add/modify plugins following existing patterns
- Test thoroughly in real TD environment
- Submit pull request with detailed description
NEW PLUGIN: CDP Orchestrator (cdp-orchestrator):
- ✅ End-to-End Pipeline Automation: Complete CDP implementation from ingestion to unification
- ✅ Slash Command:
/cdp-orchestrator:cdp-implement- Single command for full pipeline - ✅ 6-Step Phase Pattern: Generate → Deploy → Execute → Monitor → Validate → Proceed
- ✅ TD Toolbelt Integration: Automated deployment via
td wf push/start/session - ✅ Real-Time Monitoring: Polls workflow status every 30 seconds, shows elapsed time
- ✅ Intelligent Error Handling: Auto-fixes syntax errors, retries up to 3 times
- ✅ Data Validation: Verifies tables created, row counts > 0 between phases
- ✅ Progress Tracking: Uses TodoWrite for transparent real-time status
- ✅ State Management: Maintains
pipeline_state.jsonfor resume capability - ✅ Comprehensive Reporting: Final report with session IDs, data quality metrics
- ✅ 50-70 Files Generated: Complete workflows across all 4 phases
- ✅ Timeline: 3-4 hours total for complete automated execution
Documentation:
- 1,718-line comprehensive README (
plugins/cdp-orchestrator/README.md) - Complete prerequisites guide (system, TD, data source, project setup)
- Input requirements documentation for all 4 phases
- Step-by-step usage guide with examples
- Monitoring and troubleshooting guide
- FAQ with 15+ questions answered
Use Cases:
- First-time CDP implementation (scratch to production)
- Automated testing of CDP pipelines
- Standardized deployment across environments
- Multi-source ingestion with end-to-end processing
CDP Hybrid IDU Enhancements:
-
✅ New Command:
hybrid-unif-config-creator- Auto-generateunify.ymlfrom live table analysis- Uses MCP tools to analyze Snowflake/Databricks tables
- Strict PII detection (zero tolerance for guessing)
- Validates data patterns from actual table data
- Generates production-ready YAML configuration
-
✅ New Command:
hybrid-unif-merge-stats-creator- Professional HTML merge statistics reports- 10 layers of error protection (zero chance of error)
- 9 comprehensive report sections with expert analysis
- Platform-agnostic (works identically for Snowflake and Databricks)
- PDF-ready professional design
- Includes executive summary, performance analysis, data quality metrics
-
✅ Enhanced:
hybrid-setupcommand now includes automated YAML configuration as first step- Complete 3-phase workflow: Config creation → SQL generation → Execution
- User provides tables, system generates everything automatically
-
✅ Quality Improvements: Enhanced SQL generation and documentation consistency
Quality Improvements:
- All reports generate identically every time (deterministic)
- Comprehensive error handling with user-friendly messages
- Dynamic column detection for flexible master table structures
- Null-safe calculations (NULLIF protection on all divisions)
- Added
cdp-hybrid-iduplugin for Snowflake and Databricks - Cross-platform ID unification with convergence detection
- YAML-driven configuration for both platforms
- Added
cdp-histunionplugin - Historical and incremental data consolidation
- Watermark-based incremental loading
- Added
cdp-unificationplugin - Customer identity resolution for Treasure Data
- ID graph and master record creation
- Added
cdp-stagingplugin with Hive support - Data transformation and quality improvement
- PII handling and JSON extraction
- Added
cdp-stagingplugin (Presto only)
- Initial release with
cdp-ingestionplugin - Support for BigQuery, Klaviyo, Shopify, OneTrust, Pinterest, SFTP
Proprietary - APS CDP Team / APS
For questions, issues, or feature requests:
- Team: APS CDP Implementation Team
- Organization: Treasure Data (APS)
- Marketplace:
@cdp-tools-marketplace
Built with Claude Code | Powered by AI | Proven in Production