-
Notifications
You must be signed in to change notification settings - Fork 207
feat: set query label session property in bq session #4314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: set query label session property in bq session #4314
Conversation
@@ -182,7 +182,32 @@ def query_factory() -> Query: | |||
def _begin_session(self, properties: SessionProperties) -> None: | |||
from google.cloud.bigquery import QueryJobConfig | |||
|
|||
job = self.client.query("SELECT 1;", job_config=QueryJobConfig(create_session=True)) | |||
query_label = [] | |||
if "query_label" in properties and isinstance(properties["query_label"], exp.Array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can also be exp.Tuple
not just exp.Array
. Eg:
query_labels = ['label_a', 'label_b']
vs
query_labels = ('label_a', 'label_b')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, perhaps also Paren
as an edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not including a usage example in the PR description. I imagined the model definition to look like this
session_properties (
query_label = [
('key1', 'value1'),
('key2', 'value2'),
('key3', 'value3')
]
)
I chose this format based on the google docs:
a key-value pair for the label to assign to all future queries in the session. You can also add multiple key-value pairs, separated by a comma (for example, SET @@query_label = "key1:value1,key2:value2").
Also, this is consistent with the usage of labels
in both, the virtual and phyiscal properties.
This format is also why I explicitely check for an array.
What do you think about this format? Would you prefer another one?
e.g. choosing this format
query_labels = ['label_a', 'label_b'] | ('label_a', 'label_b') # array or tuple
label_a and label_b would have to be a string following this pattern: key:value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgesittas would this be an edge case you had in mind?
query_labels = ('label_a')
So query_label
is not a tuple or array but a single string 'key:vale'
wrapped in parentheses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format makes sense, but it might also be:
session_properties (
query_label = (
('key1', 'value1'),
('key2', 'value2'),
('key3', 'value3')
)
)
which would make a tuple of tuples rather than array of tuples.
Similarly
session_properties (
query_label = (
('key1', 'value1')
)
)
makes it a Paren
of tuples.
if ( | ||
isinstance(label_tuple, exp.Tuple) | ||
and len(label_tuple.expressions) == 2 | ||
and all(isinstance(label, exp.Literal) for label in label_tuple.expressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's an identifier and not a string literal eg: query_labels = [ label_a ]
and not query_labels = [ 'label_a' ]
. Might worth support both approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simply require strings; not sure what's the value of supporting identifiers here. Seems like an unnecessary increase in scope?
) | ||
else: | ||
raise SQLMeshError( | ||
"Invalid query label format. Expected a tuple of two string literals." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is something that should be captured at model load time and not at execution time. This seems too late.
Thanks for your contribution! Can you please add some tests. |
we want to use the SQLMesh session properties to attach labels to the BigQuery jobs. Using these labels we want to enable ourselves to
In the model definition, this would like as follows
This is currently not possible in an easy way since all jobs are created by the same service account.
The screenshot below displays the information about a BigQuery job created by sqlmesh to backfill a model. In the last line you can see the label my_session_property_key: my_session_property_value. This comes from the session_properties of the sqlmesh model definition. According to the google docs for BigQuery, all jobs that are run within a session inherit the labels set at the beginning of the session using SET @@query_label = <my_labels>