Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Allow for Snowflake key-pair authentication with --dbt #450

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 42 additions & 3 deletions data_diff/dbt.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
from typing import List, Optional, Dict, Tuple
from pathlib import Path

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives.asymmetric import dsa
from cryptography.hazmat.primitives import serialization

import requests


Expand Down Expand Up @@ -366,22 +371,56 @@ def _get_connection_creds(self) -> Tuple[Dict[str, str], str]:

return credentials, conn_type

@staticmethod
def _get_snowflake_private_key(credentials):
"""Get Snowflake private key by path, from a Base64 encoded DER bytestring or None."""
if credentials.get("private_key") and credentials.get("private_key_path"):
raise Exception("Cannot specify both `private_key` and `private_key_path`")

if credentials.get("private_key_passphrase"):
encoded_passphrase = credentials.get("private_key_passphrase").encode()
else:
encoded_passphrase = None

if credentials.get("private_key"):
p_key = serialization.load_der_private_key(
base64.b64decode(credentials.get("private_key")),
password=encoded_passphrase,
backend=default_backend(),
)
elif credentials.get("private_key_path"):
with open(credentials.get("private_key_path"), "rb") as key:
p_key = serialization.load_pem_private_key(
key.read(), password=encoded_passphrase, backend=default_backend()
)
else:
return None

return p_key.private_bytes(
encoding=serialization.Encoding.DER,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption(),
)

def set_connection(self):
credentials, conn_type = self._get_connection_creds()

if conn_type == "snowflake":
if credentials.get("password") is None or credentials.get("private_key_path") is not None:
raise Exception("Only password authentication is currently supported for Snowflake.")
if credentials.get("authenticator") is not None:
raise Exception("Federated authentication is not currently supported for Snowflake.")
conn_info = {
"driver": conn_type,
"user": credentials.get("user"),
"password": credentials.get("password"),
"account": credentials.get("account"),
"database": credentials.get("database"),
"warehouse": credentials.get("warehouse"),
"role": credentials.get("role"),
"schema": credentials.get("schema"),
}
if credentials.get("password") is not None:
conn_info["password"] = credentials.get("password")
else:
conn_info["private_key"] = self._get_snowflake_private_key(credentials)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the contribution @jakedave

Want to make sure you saw some of the downstream snowflake code, which is currently in a separate repo (likely to change).

https://github.com/datafold/sqeleton/blob/6c532733e1a3efe329b573691d3d9d18dcf4f410/sqeleton/databases/snowflake.py#L166

If we set conn_info["key"], that if statement will eventually fire with some code similar to _get_snowflake_private_key

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the callout - I was not aware of this code!

To be clear, are you asking that I follow the pattern outlined there for error-handling? Happy to follow that example!

Copy link
Contributor

@dlawin dlawin Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More calling out that this change would not be DRY as is.

_get_snowflake_private_key should not be necessary and we should limit the connection code in dbt.py to only putting the needed values in conn_info, as much of the key functionality already exists. If we'd like to extend that functionality, we should do so in this file/repo

Copy link
Author

@jakedave jakedave Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so remove the private key restriction that is imposed in dbt.py and handle downstream.

It seems by the way the code flows that a wider refactor would be required - it attempts to connect with the provided credentials before ever getting to the downstream code I believe that code is not called in the case of dbt

Our company is trying to evaluate this product; we are unable to do so while data-diff does not support key pair auth for dbt

Would it be possible to contribute this as is? I'm happy to provide further contribution in terms of DRYing out the code in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying it's creating a snowflake connection somewhere other than here? https://github.com/datafold/sqeleton/blob/6c532733e1a3efe329b573691d3d9d18dcf4f410/sqeleton/databases/snowflake.py#L182

I may not be following, I can take a closer look later today

Copy link
Author

@jakedave jakedave Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in this but it appears that in the main function, dbt has its own branch of logic which calls the DBT parser's connector as opposed to sqeleton's connector. I don't think the code you linked is called in the case of dbt specifically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jake, it definitely is, the entry point for the connection creation starts here. Eventually making its way to sqeleton

table1 = connect_to_table(

I can take a closer look at this later today, and implement the changes I'm describing if that helps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I was about to recorrect myself. Following the inheritance as an outsider here is non-trivial. I can keep looking but if you get to it that would be great too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlawin is the plan to refactor the sqeleton code while removing exceptions related to the presence of a priv key here? as is, sqeleton does not allow you to specify a keyphrase for a priv key and seems like it could benefit from at least the _get_snowflake_private_key function

self.threads = credentials.get("threads")
self.requires_upper = True
elif conn_type == "bigquery":
Expand Down