Skip to content

1.8.0 not able to detect existing tables anymore #691

Closed
@thijs-nijhuis

Description

Describe the bug

Dbt-databricks 1.8.0 doesn't seem to be able to determine if a table already exists anymore. I have noticed this with both seeds and incremental models.

Seeds
We have a seed in our project called 'country'. When running dbt build --select country it succeeds if the country table is not created yet. But the second time you run it, it fails with this error messages:

Runtime Error in seed country (seeds\example\country.csv)
  [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `schema_name`.`country` because it already exists.
  Choose a different name, drop or replace the existing object, add the IF NOT EXISTS clause to tolerate pre-existing objects, or add the OR REFRESH clause to refresh the existing streaming table. SQLSTATE: 42P07

When I reinstall dbt-databricks 1.7.14, I can run the command over and over again and the incremental models do create merge statements.
I saw there was a fix with rerunning seeds in 1.8.1 but that doesn't solve it for me. Also, we don't have 'persist_doc' set, nor do we have a description for this seed.

Incremental
Our incremental models are always run as a 'create or replace' statement instead of a merge after upgrading tot 1.8.0. I see that locally in my target\run folder. I also see it on UC when running 'describe history catalog.schema.table_name' where all recent chnages are 'create or replace' instead of 'merge'.

Steps To Reproduce

Add a seed to the project called country.csv. Use dbt_project.yml to set its target catalog and schema (not sure if that is required). Run dbt build --select country or simply dbt seed; this should work. Then run the same command again and it should fail.

For the incremental models, simply create a model, set it to incremental and add this line: {{ log("For model '"~model.name~"' is_incremental() is set to '"~is_incremental()~"'", True) }} . The second time the model is run, 'is_incremental()' should be 'True' but it is not.

Expected behavior

Automated detection if the table already exists.

Screenshots and log output

See error output above.

System information

The output of dbt --version:

Core:
  - installed: 1.8.1
  - latest:    1.8.1 - Up to date!

Plugins:
  - databricks: 1.8.1 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using:
Windows 11 enterprise
The output of python --version:
Python 3.11.4

Additional context

I tried to debug the dbt-databricks seeds materialization locally. This line yields 'None' for me where I would expect it to get the relation if the table exists. Weirdly enough, the used parameters (database, schema and identifier) all have the correct value.
I see the same thing happening when running from a Databricks workflow. It uses a job cluster so it will get a fresh install of dbt-databricks 1.8.1 on each job run. No other packages installed.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions