-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference Hub not updated during incremental load in Snowflake #128
Comments
Hi @OGrohmann , thank you for reporting the issue, gonna test here on my side and let you know what I find. |
Hi @OGrohmann just tested both the initial load and incremental load here for the reference hub and it works fine here (also using Snowflake adapter). My guess is the Load Datetimestamp for your records in your stage that you want to add to your ref hub are either earlier or equal to the Load Datetimestamp already in your Reference Hub. Kind regards, |
Hi @bschlottfeldt , |
@OGrohmann hi Oliver, i will test it next week, sorry about the late response. Thanks for pointing out the issue. |
@OGrohmann i can confirm the problem happens as you have described, this part of the compiled code in the incremental run of ref_hub macro wont return any results if one of the keys in the reference hub was inserted as null. Since the ghost record default datatype for TEXT in Snowflake was defined by our macro as NULL, it inserts null as the reference key for the ghost record. The presence of a null in the query of distinct_target_ref_keys the filter for NOT IN wont work, it will produce no results, therefore no records will be inserted in the incremental run, even though there are new records. FYI @tkirschke @thoffmann-sf this should be prevented in the ref_hub macro by adding a where refkey is not null clause in the distinct_target_ref_keys. and also should be fixed by defining the default value for ghost record of datatype TEXT as '(unknown)' for the zero key and '(error)' for the error key.
Kind regards, |
Hi @OGrohmann , I have added a Branch to this Issue where I tried to fix the problem, by excluding NULL refkeys, and defining the default value for TEXT columns, as @bschlottfeldt suggested. Can you check if the fix works for you? Best regards, |
This issue is stale because it has been open for 90 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue. Otherwise it will be closed in 14 days |
Hello @tkirschke , can you please help me how to specifiy the correct branch in my packages? Thanks! |
Hi @OGrohmann, to specify the branch you have to set the following in your packages.yml:
Best Regards, |
Hi @tkiehn / @tkirschke, I have tested the fix from the branch but seems like the issue is still persists. From the logs, I can see its excluding records from stage layer where both refkeys are NULL [ WHERE NOT ( KEY1 IS NULL AND KEY2 IS NULL ) ], while for our case only KEY2 is declared as "TEXT" in stage model (tested the solution works if both our refkeys are of type "TEXT").
Please advise on this. Thank you! Best Regards, |
We are using the ref_hub macro in an incremental model in our Snowflake environment to build our reference tables. We have observed, that the initial load for the reference hub works fine, however, during the incremental loads new ref hub keys are not being added. The SQL syntax looks ok, however it seems not to work with Snowflake.
This issue is only observed in the reference hub model. The ref satellite is being correctly updated.
However, as a consequence new entries are missing in the final reference table, as the new hub entries are missing.
Steps to reproduce:
{{ config(materialized='incremental') }}
{%- set yaml_metadata -%}
source_models: staging_model
ref_keys:
- KEY1
- KEY2
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ datavault4dbt.ref_hub(source_models=metadata_dict['source_models'],
ref_keys=metadata_dict['ref_keys']) }}
The text was updated successfully, but these errors were encountered: