Skip to content

refactor: add uid generator and encasualate query as cte in SQLGlotCompiler #1679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 4, 2025

Conversation

chelsea-lin
Copy link
Contributor

No description provided.

@chelsea-lin chelsea-lin requested review from a team as code owners May 1, 2025 19:03
@chelsea-lin chelsea-lin requested a review from TrevorBergeron May 1, 2025 19:03
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 1, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_sqlglot_cte branch from 2a4a828 to e05cfce Compare May 1, 2025 20:11
Comment on lines 24 to 42
class SequentialUIDGenerator:
"""
Generates sequential-like UIDs with multiple prefixes, e.g., "t0", "t1", "c0", "t2", etc.
"""

def __init__(self):
self.prefix_counters = {}

def generate_sequential_uid(self, prefix: str) -> str:
"""Generates a sequential UID with specified prefix."""
if prefix not in self.prefix_counters:
self.prefix_counters[prefix] = 0

uid = f"{prefix}{self.prefix_counters[prefix]}"
self.prefix_counters[prefix] += 1
return uid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably nicer to define as a generator (eg. yield uid in loop). And maybe we want thread safety? (i know, we don't have on existing guid generators)

Copy link
Contributor Author

@chelsea-lin chelsea-lin May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it into a generator and added a note indicating that the class is not thread-safe, as it's currently intended for internal use only.

"""
Remap all variables in the BFET using the id_generator.
root: nodes.BigFrameNode,
uid_gen: guid.SequentialUIDGenerator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unneeded to restrict the type like this.

Copy link
Contributor Author

@chelsea-lin chelsea-lin May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it into typing.Iterator[identifiers.ColumnId]

) -> SQLGlotIR:
selected_cols = [
cols_expr = [
sge.Alias(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to alias, even if the ids don't change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the IDs remain unchanged, we could omit aliasing (e.g., col1 as col1) to shorten the SQL, although it's grammatically valid. However, our golden tests don't see this case, but use aliases like col0 as col9, where the expression is ColumnDef(col0) and the ID is ColumnId(col9). Therefore, to optimize SQL length, we should first address nodes.SelectionNode. IIUC, I can create a bug ticket to revisit this for SQL length optimization later.

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_sqlglot_cte branch from e05cfce to 60c1bb6 Compare May 2, 2025 18:27
@chelsea-lin chelsea-lin requested a review from TrevorBergeron May 2, 2025 18:32
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_sqlglot_cte branch from 60c1bb6 to 879ea44 Compare May 2, 2025 20:27
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_sqlglot_cte branch from 879ea44 to b37e6ac Compare May 2, 2025 20:38
@chelsea-lin chelsea-lin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 4, 2025
@chelsea-lin chelsea-lin enabled auto-merge (squash) May 4, 2025 05:19
@chelsea-lin chelsea-lin merged commit 86b7504 into main May 4, 2025
17 of 24 checks passed
@chelsea-lin chelsea-lin deleted the main_chelsealin_sqlglot_cte branch May 4, 2025 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants