Skip to content

Conversation

@fyang2223
Copy link

@fyang2223 fyang2223 commented Nov 1, 2025

resolves #429

Problem

When using truncate() function in Athena partition keys for string columns, the SQL generated is invalid because truncate() only works on decimal places in Athena/Trino. String columns must use substr() instead, and for numeric columns, we need to apply division. The get_partition_batches macro was generating invalid SQL like truncate(str_col, 1) which would cause query failures.

Solution

  • Added logic to detect truncate partition keys and convert truncate(col, N) to substr(col, 1, N) for non-numeric columns, and floor(col / N) for numeric columns.
  • Overrode is_integer() and is_numeric() in AthenaColumn to properly recognize Athena dtype values (int, bigint, decimal(P,S))
  • Updated the get_partition_batches macro to always fetch column info from the temporary unpartitioned Athena Iceberg relation.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

…accept relation_columns for improved handling of truncate() on non-numeric columns
…o enforce relation_columns as a required parameter
…eferences in format_partition_keys and format_one_partition_key tests to include relation_columns
@fyang2223 fyang2223 requested a review from a team as a code owner November 1, 2025 14:46
@cla-bot cla-bot bot added the cla:yes The PR author has signed the CLA label Nov 1, 2025
@fyang2223
Copy link
Author

The PR currently involves changes to the function signature of format_partition_keys and format_one_partition_key, which could be a breaking change. Alternatively, instead of putting the column datatype logic inside the python functions, we could extract it to the caller, the get_partition_batches macro. Adding the following excerpt does the trick, and would satisfy the 'no interface changes' condition:

    ...
    {%- set athena_partitions_limit = config.get('partitions_limit', 100) | int -%}

    {# --- Start new code below --- #}

    {# Get column info from relation (for truncate transformation) #}
    {%- set relation_columns = adapter.get_columns_in_relation(sql) -%}

    {# Transform truncate() partition keys based on column types #}
    {%- set transformed_partitioned_by = [] -%}
    {%- for partition_key in partitioned_by -%}
        {%- set truncate_match = modules.re.search('truncate\\((.+?),\\s*(\\d+)\\)', partition_key.lower()) -%}
        {%- if truncate_match -%}
            {%- set col_name = truncate_match.group(1) -%}
            {%- set width = truncate_match.group(2) -%}

            {# Find the column type #}
            {%- set column = None -%}
            {%- for col in relation_columns -%}
                {%- if col.name.lower() == col_name.lower() -%}
                    {%- set column = col -%}
                {%- endif -%}
            {%- endfor -%}

            {# Transform based on column type #}
            {%- if column and not column.is_numeric() -%}
                {%- do transformed_partitioned_by.append('substr(' ~ col_name ~ ', 1, ' ~ width ~ ')') -%}
            {%- else -%}
                {%- do transformed_partitioned_by.append('floor(' ~ col_name ~ ' / ' ~ width ~ ')') -%}
            {%- endif -%}
        {%- else -%}
            {# No transformation needed, keep original #}
            {%- do transformed_partitioned_by.append(partition_key) -%}
        {%- endif -%}
    {%- endfor -%}

    {%- set partitioned_keys = adapter.format_partition_keys(transformed_partitioned_by) -%}

    {# --- End new code above --- #}

    {% do log('PARTITIONED KEYS: ' ~ partitioned_keys) %}
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla:yes The PR author has signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] truncate() partition transformation does not work when it includes more than 100 partitions

2 participants