Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior: Get column info from information_schema Part I #808

Merged
merged 10 commits into from
Sep 27, 2024

Conversation

benc-db
Copy link
Collaborator

@benc-db benc-db commented Sep 25, 2024

Partial fix for #779

Description

This is my second attempt at addressing the issue that describe extended can truncate complex types. With the release of dbt-core 1.8.7, we can now process behavior flags; this PR introduces the choice of using information_schema for grabbing column information in get_columns_for_relation. I'm hiding behind a behavior flag because given the current state of UC information_schema, we have to run repair to trust that columns of recently created or altered table will be present in the information_schema, which adds overhead. Furthermore, this trick only works for Delta tables at this time. It is hoped that in time the sync issue with information_schema will be solved, but in the mean time, users can use this flag when they have complex types that describe extended truncates.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@@ -116,26 +119,6 @@ class DatabricksConfig(AdapterConfig):
merge_with_schema_evolution: Optional[bool] = None


def check_not_found_error(errmsg: str) -> bool:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to prevent circular dependency.

@@ -175,6 +158,19 @@ class DatabricksAdapter(SparkAdapter):
}
)

get_column_behavior: GetColumnsBehavior

def __init__(self, config: Any, mp_context: SpawnContext) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't control the invocation of init, and thus can't pass the behaviors in, doing this during init ensures we only check the behavior flag once. I tried to make this a little more functional but couldn't figure out how to override an existing function definition that is inherited from a parent, hence the goofy class-based strategy.

@@ -28,6 +28,25 @@
{% do return(load_result('get_columns_comments').table) %}
{% endmacro %}

{% macro get_columns_comments_via_information_schema(relation) -%}
{% call statement('repair_table', fetch_result=False) -%}
REPAIR TABLE {{ relation|lower }} SYNC METADATA
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure information_schema is up to date prior to using this method.

@benc-db benc-db merged commit 41c164e into 1.9.latest Sep 27, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant