-
Notifications
You must be signed in to change notification settings - Fork 232
Issue 429 athena adapter partitioning #1418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fyang2223
wants to merge
9
commits into
dbt-labs:main
Choose a base branch
from
fyang2223:issue-429-athena-adapter-partitioning
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Issue 429 athena adapter partitioning #1418
fyang2223
wants to merge
9
commits into
dbt-labs:main
from
fyang2223:issue-429-athena-adapter-partitioning
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…accept relation_columns for improved handling of truncate() on non-numeric columns
…o enforce relation_columns as a required parameter
…eferences in format_partition_keys and format_one_partition_key tests to include relation_columns
Author
|
The PR currently involves changes to the function signature of ...
{%- set athena_partitions_limit = config.get('partitions_limit', 100) | int -%}
{# --- Start new code below --- #}
{# Get column info from relation (for truncate transformation) #}
{%- set relation_columns = adapter.get_columns_in_relation(sql) -%}
{# Transform truncate() partition keys based on column types #}
{%- set transformed_partitioned_by = [] -%}
{%- for partition_key in partitioned_by -%}
{%- set truncate_match = modules.re.search('truncate\\((.+?),\\s*(\\d+)\\)', partition_key.lower()) -%}
{%- if truncate_match -%}
{%- set col_name = truncate_match.group(1) -%}
{%- set width = truncate_match.group(2) -%}
{# Find the column type #}
{%- set column = None -%}
{%- for col in relation_columns -%}
{%- if col.name.lower() == col_name.lower() -%}
{%- set column = col -%}
{%- endif -%}
{%- endfor -%}
{# Transform based on column type #}
{%- if column and not column.is_numeric() -%}
{%- do transformed_partitioned_by.append('substr(' ~ col_name ~ ', 1, ' ~ width ~ ')') -%}
{%- else -%}
{%- do transformed_partitioned_by.append('floor(' ~ col_name ~ ' / ' ~ width ~ ')') -%}
{%- endif -%}
{%- else -%}
{# No transformation needed, keep original #}
{%- do transformed_partitioned_by.append(partition_key) -%}
{%- endif -%}
{%- endfor -%}
{%- set partitioned_keys = adapter.format_partition_keys(transformed_partitioned_by) -%}
{# --- End new code above --- #}
{% do log('PARTITIONED KEYS: ' ~ partitioned_keys) %}
... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #429
Problem
When using
truncate()function in Athena partition keys for string columns, the SQL generated is invalid becausetruncate()only works on decimal places in Athena/Trino. String columns must usesubstr()instead, and for numeric columns, we need to apply division. Theget_partition_batchesmacro was generating invalid SQL liketruncate(str_col, 1)which would cause query failures.Solution
truncate(col, N)tosubstr(col, 1, N)for non-numeric columns, andfloor(col / N)for numeric columns.is_integer()andis_numeric()inAthenaColumnto properly recognize Athena dtype values (int,bigint,decimal(P,S))get_partition_batchesmacro to always fetch column info from the temporary unpartitioned Athena Iceberg relation.Checklist