feat: Short circuit query for local scan#1618
Conversation
ec35cd8 to
a944c58
Compare
a944c58 to
a89e770
Compare
| array.offsets, values, mask=array.is_null() | ||
| ) | ||
| return new_value.fill_null([]), bigframes.dtypes.list_type(values_type) | ||
| if array.type == bigframes.dtypes.JSON_ARROW_TYPE: |
There was a problem hiding this comment.
With _canonicalize_json, you probably can remove _validate_content method in this module
| def _canonicalize_scalar(json_string): | ||
| if json_string is None: | ||
| return None | ||
| return json.dumps( |
There was a problem hiding this comment.
Please add comments why we need to canonicalize json here? Refer to:
# sort_keys=True sorts dictionary keys before serialization, making
# JSON comparisons deterministic.
# separators=(',', ':') eliminate whitespace to get the most compact
# JSON representation.
There was a problem hiding this comment.
added comment
| @@ -0,0 +1,68 @@ | |||
| # Copyright 2025 Google LLC | |||
There was a problem hiding this comment.
rename the file as local_scan_executor to match with bq_caching_executor?
| from bigframes.session import executor, semi_executor | ||
|
|
||
|
|
||
| class LocalScanExecutor(semi_executor.SemiExecutor): |
There was a problem hiding this comment.
if it inherits from executor.Executor, what would be the problem? The polar executor and sql compiler executor (coming for sqlglot unit tests) are inheriting from executor.Executor. Thinking if we can avoid introduce additional classes for better readable and less maintaining.
There was a problem hiding this comment.
SemiExecutor is a much smaller interface with only a single method. The general executor interface supports exporting, caching, etc. Can hopefully much simplify the general executor interface at some point, but until then, semi-executor helps provide the interface for hybrid execution components
| actual = bbq.json_extract_array(s, "$.a") | ||
| expected = bpd.Series( | ||
| [['"ab"', '"2"', '"3 xy"'], [], ['"4"', '"5"'], None], | ||
| [['"ab"', '"2"', '"3 xy"'], [], ['"4"', '"5"'], []], |
There was a problem hiding this comment.
as we discussed, can we please check if None->[] is required or not?
There was a problem hiding this comment.
Reverting, it works without this change
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕