Skip to content

feat: set query label session property in bq session #4314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ncbkr
Copy link

@ncbkr ncbkr commented May 5, 2025

we want to use the SQLMesh session properties to attach labels to the BigQuery jobs. Using these labels we want to enable ourselves to

  • map BigQuery jobs to SQLMesh models
  • map BigQuery jobs to owners/users of SQLMesh

In the model definition, this would like as follows

session_properties (
    query_label = [
      ('label1', 'value1'),
      ('label2', 'value2'),
      ('label3', 'value3')
    ]
  )

This is currently not possible in an easy way since all jobs are created by the same service account.

The screenshot below displays the information about a BigQuery job created by sqlmesh to backfill a model. In the last line you can see the label my_session_property_key: my_session_property_value. This comes from the session_properties of the sqlmesh model definition. According to the google docs for BigQuery, all jobs that are run within a session inherit the labels set at the beginning of the session using SET @@query_label = <my_labels>

image

@@ -182,7 +182,32 @@ def query_factory() -> Query:
def _begin_session(self, properties: SessionProperties) -> None:
from google.cloud.bigquery import QueryJobConfig

job = self.client.query("SELECT 1;", job_config=QueryJobConfig(create_session=True))
query_label = []
if "query_label" in properties and isinstance(properties["query_label"], exp.Array):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be exp.Tuple not just exp.Array. Eg:

query_labels = ['label_a', 'label_b']

vs

query_labels = ('label_a', 'label_b')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, perhaps also Paren as an edge case.

Copy link
Author

@ncbkr ncbkr May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not including a usage example in the PR description. I imagined the model definition to look like this

session_properties (
    query_label = [
      ('key1', 'value1'),
      ('key2', 'value2'),
      ('key3', 'value3')
    ]
  )

I chose this format based on the google docs:

a key-value pair for the label to assign to all future queries in the session. You can also add multiple key-value pairs, separated by a comma (for example, SET @@query_label = "key1:value1,key2:value2").

Also, this is consistent with the usage of labels in both, the virtual and phyiscal properties.

This format is also why I explicitely check for an array.

What do you think about this format? Would you prefer another one?

e.g. choosing this format

query_labels = ['label_a', 'label_b'] | ('label_a', 'label_b')  # array or tuple

label_a and label_b would have to be a string following this pattern: key:value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgesittas would this be an edge case you had in mind?

query_labels = ('label_a')

So query_label is not a tuple or array but a single string 'key:vale' wrapped in parentheses?

Copy link
Member

@izeigerman izeigerman May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format makes sense, but it might also be:

session_properties (
    query_label = (
      ('key1', 'value1'),
      ('key2', 'value2'),
      ('key3', 'value3')
    )
  )

which would make a tuple of tuples rather than array of tuples.
Similarly

session_properties (
  query_label = (
    ('key1', 'value1')
  )
)

makes it a Paren of tuples.

if (
isinstance(label_tuple, exp.Tuple)
and len(label_tuple.expressions) == 2
and all(isinstance(label, exp.Literal) for label in label_tuple.expressions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's an identifier and not a string literal eg: query_labels = [ label_a ] and not query_labels = [ 'label_a' ]. Might worth support both approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simply require strings; not sure what's the value of supporting identifiers here. Seems like an unnecessary increase in scope?

)
else:
raise SQLMeshError(
"Invalid query label format. Expected a tuple of two string literals."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is something that should be captured at model load time and not at execution time. This seems too late.

@izeigerman
Copy link
Member

Thanks for your contribution! Can you please add some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants