Skip to content

[Bug] Incorrect surrogate key combination used in search report #64

Closed
@clay-walker

Description

@clay-walker

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

The Microsoft ads search report model uses the following columns as a surrogate key:

        - date_day
        - account_id
        - campaign_id
        - ad_group_id
        - ad_id
        - keyword_id
        - search_query
        - device_os
        - device_type
        - network

This combination of columns does not uniquely identify a record, and the test fails. The cause of this is:

  1. match_type (a concatenation of two fields) is used in the select from the base table, but not included in the surrogate key. The two fields that are concatenated into match_type are actually part of the key for the base table (bingads.keyword_performance_daily_report). This is a screenshot of Fivetran as evidence:
    image

  2. Additionally, keyword_id is part of the composite key, but that field is left joined from the keyword history table. Since it's left joined, and keywords are apparently hard deleted from the history table (wtf Microsoft!), the resulting value will be null, which is no bueno for a field that is used in a surrogate key.

Proposed solution:

  1. add match_type to the unique test in microsoft_ads.yml
  2. Inner join to keyword to remove the possibility of null keyword_id values in the surrogate key

Relevant error log or model output

Failure in test dbt_utils_unique_combination_of_columns_microsoft_ads__search_report_date_day__account_id__campaign_id__ad_group_id__ad_id__keyword_id__search_query__device_os__device_type__network (models/microsoft_ads.yml)

Expected behavior

Passing tests

dbt Project configurations

vars:
prod_database: analytics
prod_schema: business
operdb_schema_pattern: 'operdb%'
raw_data_db: GONG
"dbt_date:time_zone": "America/Los_Angeles"
timezone_constant: "America/Los_Angeles"

ad_reporting__apple_search_ads_enabled: False # by default this is assumed to be True
ad_reporting__pinterest_ads_enabled: False # by default this is assumed to be True
ad_reporting__twitter_ads_enabled: False # by default this is assumed to be True
ad_reporting__facebook_ads_enabled: False # by default this is assumed to be True
ad_reporting__snapchat_ads_enabled: False # by default this is assumed to be True
ad_reporting__tiktok_ads_enabled: False # by default this is assumed to be True

linkedin_ads_schema: linkedin_ads
linkedin_ads_database: source_raw

google_ads_schema: google_ads
google_ads_database: source_raw

microsoft_ads_schema: bingads
microsoft_ads_database: source_raw

Package versions

packages:

  • package: fivetran/salesforce_formula_utils
    version: 0.6.4

  • package: calogica/dbt_date
    version: 0.5.7

  • package: fivetran/ad_reporting
    version: [">=1.0.0", "<1.1.0"]

What database are you using dbt with?

snowflake

dbt Version

1.0.1

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions