Skip to content

Conversation

@bayees
Copy link
Contributor

@bayees bayees commented May 23, 2025

Description

Create comments and tags for tables and columns in Unity Catalog tables. Also adding primary key and foreign key constraints using the primary_key and references hints.

I have added the create_indexes to enable primary and foreign key to keep backwards compatible

There is a new Databricks adapter to use the new x-databricks-cluster, x-databricks-table-comment, x-databricks-table-tags, x-databricks-column-comment, x-databricks-column-tags hints.

The table and column comments using description hints if those are available, but overrides those if the hints from the databricks adapter i given.

Related Issues

Additional Context

bayees added 10 commits May 18, 2025 06:16
…ing support for clustering, table comments, and tags.
…and foreign key constraints, and table options. Update SQL generation for table alterations to include comments and tags.
… hints, along with examples for using the databricks_adapter. Enhance clarity on applying hints for resource metadata and constraints.
…s adapter. Improve logging for table options during ALTER TABLE operations and streamline SQL generation for constraints.
…OREIGN KEY constraints. Introduce create_indexes option in DatabricksClientConfiguration and update related classes to handle index creation logic. Add new package-lock.yml for dbt_transform examples.
…r enforcing PRIMARY KEY and FOREIGN KEY constraints on tables.
@netlify
Copy link

netlify bot commented May 23, 2025

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit f1d93ab
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/6878c376dc24cb000847c079
😎 Deploy Preview https://deploy-preview-2674--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bayees bayees changed the title Exp/add databricks metadata [Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog May 23, 2025
@rudolfix
Copy link
Collaborator

rudolfix commented Jun 2, 2025

@bayees this looks good! are you able to fix linting errors and add a basic test? like we do ie. for bigquery adapter. also we can take over this PR as we are allowed to push to your branch. ping us what's your plan

bayees added 2 commits June 2, 2025 16:08
…ents and table comments. Clean up unnecessary whitespace in the code for better readability.
@bayees
Copy link
Contributor Author

bayees commented Jun 2, 2025

@rudolfix Linting is done.

I am not sure about the test. Can you point to the specific Bigquery test?

@rudolfix
Copy link
Collaborator

rudolfix commented Jun 5, 2025

Hey! BigQuery tests are in tests/load/bigquery/test_biguery_table_builder.py. example test checking if partition is right,

def test_bigquery_partition_by_integer(
    destination_config: DestinationTestConfiguration,
) -> None:
    pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", dev_mode=True)

    @dlt.resource(
        columns={"some_int": {"data_type": "bigint", "partition": True, "nullable": False}},
    )
    def demo_resource() -> Iterator[Dict[str, int]]:
        for i in range(10):
            yield {
                "some_int": i,
            }

    @dlt.source(max_table_nesting=0)
    def demo_source() -> DltResource:
        return demo_resource

    pipeline.run(demo_source())

    with pipeline.sql_client() as c:
        with c.execute_query(
            "SELECT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.PARTITIONS WHERE partition_id IS NOT"
            " NULL);"
        ) as cur:
            has_partitions = cur.fetchone()[0]
            assert isinstance(has_partitions, bool)
            assert has_partitions

note that:

    with pipeline.sql_client() as c:
        databricks = c.native_connection

gives you access to native databricks client

you could place your tests in tests/load/databricks/test_databricks_adapter.py. also we do not need tests that are as comprehensive as BigQuery but a basic check if hints are translated in the right props in Unity Catalog would be cool.

regarding credentials:
you can place them into tests/.dlt/secrets.toml tests will find it. here are ours (without secrets). I assume you have access to the cluster?

[destination.databricks.credentials]
server_hostname = "adb-8001321225760611.11.azuredatabricks.net"
http_path = "/sql/protocolv1/o/8001321225760611/0124-183359-ghw1vo3b"
access_token = "..."
catalog = "dlt_ci"
client_id = "..."
client_secret = "..."

one more thing: we broke our devel :/ so you'd need to merge it into your branch,

@rudolfix rudolfix assigned rudolfix and burnash and unassigned rudolfix Jun 16, 2025
@burnash burnash added the enhancement New feature or request label Jun 16, 2025
@bayees
Copy link
Contributor Author

bayees commented Jun 18, 2025

@rudolfix
I added the unit test to the new Databricks adapter, but I found a non-deterministic issue. When added foreign keys to the tables, the constraint requires the related table with the primary key to exist. Are there any method for controlling the other of table creation?

@rudolfix
Copy link
Collaborator

@bayees here's PR #2791 that allows to add statements after table create/alter. you should override _get_table_post_update_sql , and move generation of FOREIGN REFERENCES there. We'll merge this PR asap and then you can merge it into your branch.

In the meantime we updated our dependency system to uv and we have new test workflows on ci. So overall things are better and much faster but please take a look at our CONTRIBUTING guide. Thanks!

bayees added 2 commits June 20, 2025 07:41
…ng and improve table update SQL generation. Removed redundant primary key constraint logic and added a new method for generating foreign key constraints post table update.
@bayees
Copy link
Contributor Author

bayees commented Jun 20, 2025

@rudolfix Updated the fork to reflect the changes in #2791. Waiting for it to be merge.

Nice with uv. 🥇

@burnash
Copy link
Collaborator

burnash commented Jun 20, 2025

@bayees thanks so much. I've reviewed #2791, as soon as the CI pass we'll merge it and I'll review this PR

bayees and others added 8 commits June 26, 2025 08:34
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
… hints. Introduced detailed descriptions for `column_comment` and `column_tags` to improve usability and clarity for users implementing schema migrations.
…. Updated SQL statements for table and column comments, as well as tags, to ensure proper handling of special characters. Refactored to use qualified table names for improved accuracy in SQL commands.
…"Suported" to "Supported" and updated references to `column_comment` and `table_comment` for consistency. Improved formatting of hints section for better readability.
…id table and column tags, table comments, and cluster types. Introduced parameterized tests for various invalid inputs and ensured proper exception raising. Enhanced test coverage for special character handling in comments and tags.
…` option for enforcing PRIMARY KEY and FOREIGN KEY constraints, and add a reference to the Databricks adapter section for additional hints.
@bayees
Copy link
Contributor Author

bayees commented Jun 27, 2025

Hey @burnash. Thank you for the review. I have implemented the changes suggested.

Please take a look at it and let me know if there are additional changes needed


Databricks supports the following table hints:

- `description` - Uses the description to add comment to the table. This can also be done by using the adapter method `table_comment`.
Copy link
Collaborator

@burnash burnash Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by using the adapter method table_comment.

@bayees could you clarify if you mean "adapter method table_comment" or "adapter parameter table_comment"?

Databricks supports the following column hints:

- `primary_key` - adds a primary key constraint to the column in Unity Catalog.
- `description` - adds a description to the column. This can also be done by using the adapter method `table_comment`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Collaborator

@burnash burnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, @bayees, thank you for the documentation and tests. Have a couple of minor comments, please take a look.

@bayees
Copy link
Contributor Author

bayees commented Jul 10, 2025

This was the recommendation from @rudolfix. Do you have input here?

@burnash
Copy link
Collaborator

burnash commented Jul 10, 2025

This was the recommendation from @rudolfix.

@bayees just to clarify, which of the comments you're referring to?

@nicor88
Copy link

nicor88 commented Jul 10, 2025

Thanks @bayees for this PR - for what I understand, this PR doesn't cover the possibility of marking a column a partition, adding z-order for a column, or liquid clustering, this will be a relevant feature to add, which is pretty relevant for teams that care about "query optimization" and cost control.

I don't think that is a good idea to add the above feature in this PR, but might be relevant to track that in other issue or then have other code changes related to it, WDYT @burnash ?

@burnash
Copy link
Collaborator

burnash commented Jul 10, 2025

@bayees please take a look at documentation comments as well. Let me know if you need help with fixing mypy on CI. We'd need to make sure the tests pass before we merge the PR.

@burnash
Copy link
Collaborator

burnash commented Jul 10, 2025

@nicor88 thanks for the suggestion. Agreed, it's best to open a new related issue for this.

@burnash
Copy link
Collaborator

burnash commented Jul 15, 2025

Hi @bayees @christian-bay-backstage, just following up to see if you'd like to continue working on this PR or if you'd prefer that I take it from here. Thanks again for your effort!

@bayees
Copy link
Contributor Author

bayees commented Jul 15, 2025

Hi @bayees @christian-bay-backstage, just following up to see if you'd like to continue working on this PR or if you'd prefer that I take it from here. Thanks again for your effort!

I am currently on vacation, so if the only things to change is the phrasing in the documentation and fix the mypy, then please feel free to fix. Otherwise I will look into it in a week or two.

@burnash burnash self-requested a review July 17, 2025 09:34
Copy link
Collaborator

@burnash burnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bayees thank you very much for the contribution.

@burnash burnash merged commit 526a630 into dlt-hub:devel Jul 18, 2025
81 of 83 checks passed
zilto pushed a commit that referenced this pull request Jul 22, 2025
…a hints (#2674)

Co-authored-by: Anton Burnashev <anton.burnashev@gmail.com>
@bayees bayees deleted the exp/add-databricks-metadata branch September 21, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Databricks] Add comments and tags hints for table and column

5 participants