Skip to content
This repository was archived by the owner on May 2, 2023. It is now read-only.

Support connections to DuckDB for data-diff --dbt #22

Closed

Conversation

dbeatty10
Copy link
Contributor

resolves #21

Overview

This PR probably isn't the long-term solution that you want. But it was a quick-n-easy way for me to get moving again for this demo.

Totally fine with me if you want to fix the problem below a different way!

Details

I was trying to run the following command when using a custom fork that added support for duckdb connections:

data-diff --dbt

But I got the following error:

ERROR - DuckDB: Bad table path for <data_diff.databases.duckdb.DuckDB object at 0x10b67f5e0>: 'main.dev.simple_model'. Expected form: schema.table

Since I wasn't super interested in discovering why this was happening or how best to solve it, this PR represents my quickest solution to the problem.

Here's the full stack trace:

Traceback (most recent call last):
  File "/Users/dbeatty/projects/data-diff/env/bin/data-diff", line 8, in <module>
    sys.exit(main())
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/dbeatty/projects/data-diff/data_diff/__main__.py", line 267, in main
    dbt_diff(
  File "/Users/dbeatty/projects/data-diff/data_diff/dbt.py", line 76, in dbt_diff
    _local_diff(diff_vars)
  File "/Users/dbeatty/projects/data-diff/data_diff/dbt.py", line 124, in _local_diff
    table1_columns = list(table1.get_schema())
  File "/Users/dbeatty/projects/data-diff/data_diff/table_segment.py", line 85, in get_schema
    return self.database.query_table_schema(self.table_path)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/sqeleton/databases/base.py", line 389, in query_table_schema
    rows = self.query(self.select_table_schema(path), list)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/sqeleton/databases/base.py", line 380, in select_table_schema
    schema, name = self._normalize_table_path(path)
  File "/Users/dbeatty/projects/data-diff/env/lib/python3.9/site-packages/sqeleton/databases/base.py", line 488, in _normalize_table_path
    raise ValueError(f"{self.name}: Bad table path for {self}: '{'.'.join(path)}'. Expected form: schema.table")
ValueError: DuckDB: Bad table path for <data_diff.databases.duckdb.DuckDB object at 0x10b67f5e0>: 'main.dev.simple_model'. Expected form: schema.table

@erezsh
Copy link
Contributor

erezsh commented Feb 24, 2023

We are interested in fixing things the proper way. Quick and easy for you means strange and confusing bugs for others.

The question to ask is why would dbt return a 3-part path when duckdb supports only 2 parts.

If we're going to truncate the path at all, it's going to be inside dbt.py and not duckdb.py.

Also, whatever fix we land on, I think it should just be included in PR #408 so I'm closing this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when connecting to DuckDB with data-diff --dbt
2 participants