Description
Describe the bug
Dbt-databricks 1.8.0 doesn't seem to be able to determine if a table already exists anymore. I have noticed this with both seeds and incremental models.
Seeds
We have a seed in our project called 'country'. When running dbt build --select country
it succeeds if the country table is not created yet. But the second time you run it, it fails with this error messages:
Runtime Error in seed country (seeds\example\country.csv)
[TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `schema_name`.`country` because it already exists.
Choose a different name, drop or replace the existing object, add the IF NOT EXISTS clause to tolerate pre-existing objects, or add the OR REFRESH clause to refresh the existing streaming table. SQLSTATE: 42P07
When I reinstall dbt-databricks 1.7.14, I can run the command over and over again and the incremental models do create merge statements.
I saw there was a fix with rerunning seeds in 1.8.1 but that doesn't solve it for me. Also, we don't have 'persist_doc' set, nor do we have a description for this seed.
Incremental
Our incremental models are always run as a 'create or replace' statement instead of a merge after upgrading tot 1.8.0. I see that locally in my target\run folder. I also see it on UC when running 'describe history catalog.schema.table_name' where all recent chnages are 'create or replace' instead of 'merge'.
Steps To Reproduce
Add a seed to the project called country.csv. Use dbt_project.yml to set its target catalog and schema (not sure if that is required). Run dbt build --select country
or simply dbt seed
; this should work. Then run the same command again and it should fail.
For the incremental models, simply create a model, set it to incremental and add this line: {{ log("For model '"~model.name~"' is_incremental() is set to '"~is_incremental()~"'", True) }}
. The second time the model is run, 'is_incremental()' should be 'True' but it is not.
Expected behavior
Automated detection if the table already exists.
Screenshots and log output
See error output above.
System information
The output of dbt --version
:
Core:
- installed: 1.8.1
- latest: 1.8.1 - Up to date!
Plugins:
- databricks: 1.8.1 - Up to date!
- spark: 1.8.0 - Up to date!
The operating system you're using:
Windows 11 enterprise
The output of python --version
:
Python 3.11.4
Additional context
I tried to debug the dbt-databricks seeds materialization locally. This line yields 'None' for me where I would expect it to get the relation if the table exists. Weirdly enough, the used parameters (database, schema and identifier) all have the correct value.
I see the same thing happening when running from a Databricks workflow. It uses a job cluster so it will get a fresh install of dbt-databricks 1.8.1 on each job run. No other packages installed.
Activity