Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[logical-types] update working branch (#12812)
* Add support for external tables with qualified names (#12645) * Make support schemas * Set default name to table * Remove print statements and stale comment * Add tests for create table * Fix typo * Update datafusion/sql/src/statement.rs Co-authored-by: Jonah Gao <jonahgao@msn.com> * convert create_external_table to objectname * Add sqllogic tests * Fix failing tests --------- Co-authored-by: Jonah Gao <jonahgao@msn.com> * Fix Regex signature types (#12690) * Fix Regex signature types * Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else * Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt * Refactor `ByteGroupValueBuilder` to use `MaybeNullBufferBuilder` (#12681) * Fix malformed hex string literal in docs (#12708) * Simplify match patterns in coercion rules (#12711) Remove conditions where unnecessary. Refactor to improve readability. * Remove aggregate functions dependency on frontend (#12715) * Remove aggregate functions dependency on frontend DataFusion is a SQL query engine and also a reusable library for building query engines. The core functionality should not depend on frontend related functionalities like `sqlparser` or `datafusion-sql`. * Remove duplicate license header * Minor: Remove clone in `transform_to_states` (#12707) * rm clone Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Refactor tests for union sorting properties, add tests for unions and constants (#12702) * Refactor tests for union sorting properties * update doc test * Undo import reordering * remove unecessary static lifetimes * Fix: support Qualified Wildcard in count aggregate function (#12673) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics (#12703) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics * Fix docs * Disallow duplicated qualified field names (#12608) * Disallow duplicated qualified field names * Fix tests * Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) (#12675) * add bench * replace macro with generic function * remove duplicated code * optimize base64/hex decode * Allow DynamicFileCatalog support to query partitioned file (#12683) * support to query partitioned table for dynamic file catalog * cargo clippy * split partitions inferring to another function * Support `LIMIT` Push-down logical plan optimization for `Extension` nodes (#12685) * Update trait `UserDefinedLogicalNodeCore` Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> Add rewrite rule for `push-down-limit` for `Extension` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add rewrite rule for `push-down-limit` for `Extension` and tests Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> * Reorganize to match guard Signed-off-by: Austin Liu <austin362667@gmail.com> * Clena up Signed-off-by: Austin Liu <austin362667@gmail.com> Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix AvroReader: Add union resolving for nested struct arrays (#12686) * Add union resolving for nested struct arrays * Add test * Change test * Reproduce index error * fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Adds macros for creating `WindowUDF` and `WindowFunction` expression (#12693) * Adds macro for udwf singleton * Adds a doc comment parameter to macro * Add doc comment for `create_udwf` macro * Uses default constructor * Update `Cargo.lock` in `datafusion-cli` * Fixes: expand `$FN_NAME` in doc strings * Adds example for macro usage * Renames macro * Improve doc comments * Rename udwf macro * Minor: doc copy edits * Adds macro for creating fluent-style expression API * Adds support for 1 or more parameters in expression function * Rewrite doc comments * Rename parameters * Minor: formatting * Adds doc comment for `create_udwf_expr` macro * Improve example docs * Hides extraneous code in doc comments * Add a one-line readme * Adds doc test assertions + minor formatting fixes * Adds common macro for defining user-defined window functions * Adds doc comment for `define_udwf_and_expr` * Defines `RowNumber` using common macro * Add usage example for common macro * Adds usage for custom constructor * Add examples for remaining patterns * Improve doc comments for usage examples * Rewrite inner line docs * Rewrite `create_udwf_expr!` doc comments * Minor doc improvements * Fix doc test and usage example * Add inline comments for macro patterns * Minor: change doc comment in example * Support unparsing plans with both Aggregation and Window functions (#12705) * Support unparsing plans with both Aggregation and Window functions (#35) * Fix unparsing for aggregation grouping sets * Add test for grouping set unparsing * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update * More tests --------- Co-authored-by: Jax Liu <liugs963@gmail.com> * Fix strpos invocation with dictionary and null (#12712) In 1b3608d `strpos` signature was modified to indicate it supports dictionary as input argument, but the invoke method doesn't support them. * docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine (#12666) * Update DataFusion introduction to show that DataFusion offers packaged versions for end users * change order * Update README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refine wording and update user guide for consistency * prettier --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Framework for generating function docs from embedded code documentation (#12668) * Initial work on #12432 to allow for generation of udf docs from embedded documentation in the code * Add missing license header. * Fixed examples. * Fixing a really weird RustRover/wsl ... something. No clue what happened there. * permission change * Cargo fmt update. * Refactored Documentation to allow it to be used in a const. * Add documentation for syntax_example * Refactoring Documentation based on PR feedback. * Cargo fmt update. * Doc update * Fixed copy/paste error. * Minor text updates. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add IMDB(JOB) Benchmark [2/N] (imdb queries) (#12529) * imdb dataset * cargo fmt * Add 113 queries for IMDB(JOB) Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `get_query_sql` from `query_id` string Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix CSV reader & Remove Parquet partition Signed-off-by: Austin Liu <austin362667@gmail.com> * Add benchmark IMDB runner Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `run_imdb` script Signed-off-by: Austin Liu <austin362667@gmail.com> * Add checker for imdb option Signed-off-by: Austin Liu <austin362667@gmail.com> * Add SLT for IMDB Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> * Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> * Add missing license Signed-off-by: Austin Liu <austin362667@gmail.com> * Add IMDB(JOB) queries `2b` to `5c` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `INCLUDE_IMDB` in CI verify-benchmark-results Signed-off-by: Austin Liu <austin362667@gmail.com> * Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> * use uint as id type * format * Seperate `tpch` and `imdb` benchmarking CI jobs Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Remove `tpch` in `imdb` benchmark Signed-off-by: Austin Liu <austin362667@gmail.com> * Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: DouPache <douenergy@gmail.com> * Minor: avoid clone while calculating union equivalence properties (#12722) * Minor: avoid clone while calculating union equivalence properties * Update datafusion/physical-expr/src/equivalence/properties.rs * fmt * Simplify streaming_merge function parameters (#12719) * simplify streaming_merge function parameters * revert test change * change StreamingMergeConfig into builder pattern * Fix links on docs index page (#12750) * Provide field and schema metadata missing on cross joins, and union with null fields. (#12729) * test: reproducer for missing schema metadata on cross join * fix: pass thru schema metadata on cross join * fix: preserve metadata when transforming to view types * test: reproducer for missing field metadata in left hand NULL field of union * fix: preserve field metadata from right side of union * chore: safe indexing * Minor: Update string tests for strpos (#12739) * Apply `type_union_resolution` to array and values (#12753) * cleanup make array coercion rule Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change to type union resolution Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change value too Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix tpyo Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Add `DocumentationBuilder::with_standard_argument` to reduce copy/paste (#12747) * Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste * fix doc * fix standard argument * Update docs * Improve documentation to explain what is different * fix `equal_to` in `PrimitiveGroupValueBuilder` (#12758) * fix `equal_to` in `PrimitiveGroupValueBuilder`. * fix typo. * add uts. * reduce calling of `is_null`. * Minor: doc how field name is to be set (#12757) * Fix `equal_to` in `ByteGroupValueBuilder` (#12770) * Fix `equal_to` in `ByteGroupValueBuilder` * refactor null_equal_to * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs * Allow simplification even when nullable (#12746) The nullable requirement seem to have been added in #1401 but as far as I can tell they are not needed for these 2 cases. I think this can be shown using this truth table: (generated using datafusion-cli without this patch) ``` > CREATE TABLE t (v BOOLEAN) as values (true), (false), (NULL); > select t.v, t2.v, t.v AND (t.v OR t2.v), t.v OR (t.v AND t2.v) from t cross join t as t2; +-------+-------+---------------------+---------------------+ | v | v | t.v AND t.v OR t2.v | t.v OR t.v AND t2.v | +-------+-------+---------------------+---------------------+ | true | true | true | true | | true | false | true | true | | true | | true | true | | false | true | false | false | | false | false | false | false | | false | | false | false | | | true | | | | | false | | | | | | | | +-------+-------+---------------------+---------------------+ ``` And it seems Spark applies both of these and DuckDB applies only the first one. * Fix unnest conjunction with selecting wildcard expression (#12760) * fix unnest statement with wildcard expression * add commnets * Improve `round` scalar function unparsing for Postgres (#12744) * Postgres: enforce required `NUMERIC` type for `round` scalar function (#34) Includes initial support for dialects to override scalar functions unparsing * Document scalar_function_to_sql_overrides fn * Fix stack overflow calculating projected orderings (#12759) * Fix stack overflow calculating projected orderings * fix docs * Port / Add Documentation for `VarianceSample` and `VariancePopulation` (#12742) * Upgrade arrow/parquet to `53.1.0` / fix clippy (#12724) * Update to arrow/parquet 53.1.0 * Update some API * update for changed file sizes * Use non deprecated APIs * Use ParquetMetadataReader from @etseidl * remove upstreamed implementation * Update CSV schema * Use upstream is_null and is_not_null kernels * feat: add support for Substrait ExtendedExpression (#12728) * Add support for serializing and deserializing Substrait ExtendedExpr message * Address clippy reviews * Reuse existing rename method * Transformed::new_transformed: Fix documentation formatting (#12787) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: Correct results for grouping sets when columns contain nulls (#12571) * Fix grouping sets behavior when data contains nulls * PR suggestion comment * Update new test case * Add grouping_id to the logical plan * Add doc comment next to INTERNAL_GROUPING_ID * Fix unparsing of Aggregate with grouping sets --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Migrate documentation for all string functions from scalar_functions.md to code (#12775) * Added documentation for string and unicode functions. * Fixed issues with aliases. * Cargo fmt. * Minor doc fixes. * Update docs for var_pop/samp --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Account for constant equivalence properties in union, tests (#12562) * Minor: clarify comment about empty dependencies (#12786) * Introduce Signature::String and return error if input of `strpos` is integer (#12751) * fix sig Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix error Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change default type Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * clippy Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix docs Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm deadcode Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Minor: improve docs on MovingMin/MovingMax (#12790) * Add slt tests (#12721) --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com> Co-authored-by: Jonah Gao <jonahgao@msn.com> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Tomoaki Kawada <kawada@kmckk.co.jp> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Jay Zhan <jayzhan211@gmail.com> Co-authored-by: HuSen <husen.xjtu@gmail.com> Co-authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@gmail.com> Co-authored-by: Simon Vandel Sillesen <simon.vandel@gmail.com> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Austin Liu <austin362667@gmail.com> Co-authored-by: JonasDev1 <jswipp@googlemail.com> Co-authored-by: jcsherin <jacob@protoship.io> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com> Co-authored-by: DouPache <douenergy@gmail.com> Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com> Co-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: wiedld <wiedld@users.noreply.github.com> Co-authored-by: kamille <caoruiqiu.crq@antgroup.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Val Lorentz <vlorentz@softwareheritage.org>
- Loading branch information