-
Notifications
You must be signed in to change notification settings - Fork 49
refactor: add uid generator and encasualate query as cte in SQLGlotCompiler #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2a4a828
to
e05cfce
Compare
bigframes/core/guid.py
Outdated
class SequentialUIDGenerator: | ||
""" | ||
Generates sequential-like UIDs with multiple prefixes, e.g., "t0", "t1", "c0", "t2", etc. | ||
""" | ||
|
||
def __init__(self): | ||
self.prefix_counters = {} | ||
|
||
def generate_sequential_uid(self, prefix: str) -> str: | ||
"""Generates a sequential UID with specified prefix.""" | ||
if prefix not in self.prefix_counters: | ||
self.prefix_counters[prefix] = 0 | ||
|
||
uid = f"{prefix}{self.prefix_counters[prefix]}" | ||
self.prefix_counters[prefix] += 1 | ||
return uid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably nicer to define as a generator (eg. yield uid
in loop). And maybe we want thread safety? (i know, we don't have on existing guid generators)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it into a generator and added a note indicating that the class is not thread-safe, as it's currently intended for internal use only.
""" | ||
Remap all variables in the BFET using the id_generator. | ||
root: nodes.BigFrameNode, | ||
uid_gen: guid.SequentialUIDGenerator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unneeded to restrict the type like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it into typing.Iterator[identifiers.ColumnId]
) -> SQLGlotIR: | ||
selected_cols = [ | ||
cols_expr = [ | ||
sge.Alias( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want to alias, even if the ids don't change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the IDs remain unchanged, we could omit aliasing (e.g., col1 as col1
) to shorten the SQL, although it's grammatically valid. However, our golden tests don't see this case, but use aliases like col0 as col9
, where the expression is ColumnDef(col0)
and the ID is ColumnId(col9)
. Therefore, to optimize SQL length, we should first address nodes.SelectionNode
. IIUC, I can create a bug ticket to revisit this for SQL length optimization later.
e05cfce
to
60c1bb6
Compare
60c1bb6
to
879ea44
Compare
879ea44
to
b37e6ac
Compare
No description provided.