-
Notifications
You must be signed in to change notification settings - Fork 48
feat: Short circuit query for local scan #1618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ec35cd8
to
a944c58
Compare
a944c58
to
a89e770
Compare
array.offsets, values, mask=array.is_null() | ||
) | ||
return new_value.fill_null([]), bigframes.dtypes.list_type(values_type) | ||
if array.type == bigframes.dtypes.JSON_ARROW_TYPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With _canonicalize_json
, you probably can remove _validate_content
method in this module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
def _canonicalize_scalar(json_string): | ||
if json_string is None: | ||
return None | ||
return json.dumps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments why we need to canonicalize json here? Refer to:
# sort_keys=True
sorts dictionary keys before serialization, making
# JSON comparisons deterministic.
# separators=(',', ':')
eliminate whitespace to get the most compact
# JSON representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
@@ -0,0 +1,68 @@ | |||
# Copyright 2025 Google LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the file as local_scan_executor
to match with bq_caching_executor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
from bigframes.session import executor, semi_executor | ||
|
||
|
||
class LocalScanExecutor(semi_executor.SemiExecutor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it inherits from executor.Executor
, what would be the problem? The polar executor and sql compiler executor (coming for sqlglot unit tests) are inheriting from executor.Executor
. Thinking if we can avoid introduce additional classes for better readable and less maintaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SemiExecutor is a much smaller interface with only a single method. The general executor interface supports exporting, caching, etc. Can hopefully much simplify the general executor interface at some point, but until then, semi-executor helps provide the interface for hybrid execution components
@@ -155,7 +155,7 @@ def test_json_extract_array_from_json_strings(): | |||
) | |||
actual = bbq.json_extract_array(s, "$.a") | |||
expected = bpd.Series( | |||
[['"ab"', '"2"', '"3 xy"'], [], ['"4"', '"5"'], None], | |||
[['"ab"', '"2"', '"3 xy"'], [], ['"4"', '"5"'], []], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we discussed, can we please check if None
->[]
is required or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting, it works without this change
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕