32.0.0 (2023-10-07)
Breaking changes:
- Remove implicit interval type coercion from ScalarValue comparison #7514 (tustvold)
- Remove get_scan_files and ExecutionPlan::file_scan_config (#7357) #7487 (tustvold)
- Move
FileCompressionType
out ofcommon
and intocore
#7596 (haohuaijin) - Update arrow 47.0.0 in DataFusion #7587 (tustvold)
- Rename
bounded_order_preserving_variants
config toprefer_exising_sort
and update docs #7723 (alamb)
Implemented enhancements:
- Parallelize Stateless (CSV/JSON) File Write Serialization #7452 (devinjdangelo)
- Create a Priority Queue based Aggregation with
limit
#7192 (avantgardnerio) - feat: add guarantees to simplification #7467 (wjones127)
- [Minor]: Produce better plan when group by contains all of the ordering requirements #7542 (mustafasrepo)
- Make AvroArrowArrayReader possible to scan Avro backed table which contains nested records #7525 (sarutak)
- feat: Support spilling for hash aggregation #7400 (kazuyukitanimura)
- Parallelize Parquet Serialization #7562 (devinjdangelo)
- feat: natively support more data types for the
abs
function. #7568 (jonahgao) - feat: Parallel collecting parquet files statistics #7573 #7595 (hengfeiyang)
- Support hashing List columns #7616 (jonmmease)
- feat: Better large output display in datafusion-cli with --maxrows option #7617 (2010YOUY01)
- feat: make parse_float_as_decimal work on negative numbers #7648 (jonahgao)
- Update Default Parquet Write Compression #7692 (devinjdangelo)
- Support all the codecs supported by Avro #7718 (sarutak)
- Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator #7721 (Dandandan)
Fixed bugs:
- fix: inconsistent behaviors when dividing floating numbers by zero #7503 (jonahgao)
- fix: skip EliminateCrossJoin rule if inner join with filter is found #7529 (epsio-banay)
- fix: check for precision overflow when parsing float as decimal #7627 (jonahgao)
- fix: substrait limit when fetch is None #7669 (waynexia)
- fix: coerce text to timestamps with timezones #7720 (mhilton)
- fix: avro_to_arrow: Handle avro nested nullable struct (union) #7663 (Samrose-Ahmed)
Documentation updates:
- Documentation Updates for New Write Related Features #7520 (devinjdangelo)
- Create 2023 Q4 roadmap #7551 (graydenshand)
- docs: add section on supports_filters_pushdown #7680 (tshauck)
- Add LanceDB to the list of Known Users #7716 (alamb)
- Document crate feature flags #7713 (alamb)
Merged pull requests:
- Prepare 31.0.0 release #7508 (andygrove)
- Minor(proto): Implement
TryFrom<&DFSchema>
forprotobuf::DfSchema
#7505 (jonahgao) - fix: inconsistent behaviors when dividing floating numbers by zero #7503 (jonahgao)
- Parallelize Stateless (CSV/JSON) File Write Serialization #7452 (devinjdangelo)
- Minor: Remove stray comment markings from encoding error message #7512 (devinjdangelo)
- Remove implicit interval type coercion from ScalarValue comparison #7514 (tustvold)
- Minor: deprecate ScalarValue::get_datatype() #7507 (Weijun-H)
- Propagate error from spawned task reading spills #7510 (viirya)
- Refactor the EnforceDistribution Rule #7488 (mustafasrepo)
- Remove get_scan_files and ExecutionPlan::file_scan_config (#7357) #7487 (tustvold)
- Simplify ScalarValue::distance (#7517) #7519 (tustvold)
- typo: change
delimeter
todelimiter
#7521 (Weijun-H) - Fix some simplification rules for floating-point arithmetic operations #7515 (jonahgao)
- Documentation Updates for New Write Related Features #7520 (devinjdangelo)
- [MINOR]: Move tests from repartition to enforce_distribution file #7539 (mustafasrepo)
- Update the async-trait crate to resolve clippy bug #7541 (metesynnada)
- Fix flaky
test_sort_fetch_memory_calculation
test #7534 (viirya) - Move common code to utils #7545 (mustafasrepo)
- Minor: Add comments and clearer constructors to
Interval
#7526 (alamb) - fix: skip EliminateCrossJoin rule if inner join with filter is found #7529 (epsio-banay)
- Create a Priority Queue based Aggregation with
limit
#7192 (avantgardnerio) - feat: add guarantees to simplification #7467 (wjones127)
- [Minor]: Produce better plan when group by contains all of the ordering requirements #7542 (mustafasrepo)
- Minor: beautify interval display #7554 (Weijun-H)
- replace ptree with termtree #7560 (avantgardnerio)
- Make AvroArrowArrayReader possible to scan Avro backed table which contains nested records #7525 (sarutak)
- Fix a race condition issue on reading spilled file #7538 (sarutak)
- [MINOR]: Add is single method #7558 (mustafasrepo)
- Fix
describe <table>
to work without SessionContext #7441 (alamb) - Make the tests in SHJ faster #7543 (metesynnada)
- feat: Support spilling for hash aggregation #7400 (kazuyukitanimura)
- Make backtrace as a cargo feature #7527 (comphead)
- Minor: Fix
clippy
by switching totimestamp_nanos_opt
instead of (deprecated)timestamp_nanos
#7572 (alamb) - Update sqllogictest requirement from 0.15.0 to 0.16.0 #7569 (dependabot[bot])
- extract
datafusion-physical-plan
to its own crate #7432 (alamb) - First and Last Accumulators should update with state row excluding is_set flag #7565 (viirya)
- refactor: simplify code of eliminate_cross_join.rs #7561 (jackwener)
- Update release instructions for datafusion-physical-plan crate #7576 (alamb)
- Minor: Update chrono pin to
0.4.31
#7575 (alamb) - [feat] Introduce cacheManager in session ctx and make StatisticsCache share in session #7570 (Ted-Jiang)
- Enhance/Refactor Ordering Equivalence Properties #7566 (mustafasrepo)
- fix misplaced statements in sqllogictest #7586 (jonahgao)
- Update substrait requirement from 0.13.1 to 0.14.0 #7585 (dependabot[bot])
- chore: use the
create_udwf
function insimple_udwf
, consistent withsimple_udf
andsimple_udaf
#7579 (tanruixiang) - Implement protobuf serialization for AnalyzeExec #7574 (adhish20)
- chore: fix catalog's usage docs error and add docs about
CatalogList
trait #7582 (tanruixiang) - Implement
CardinalityAwareRowConverter
while doing streaming merge #7401 (JayjeetAtGithub) - Parallelize Parquet Serialization #7562 (devinjdangelo)
- feat: natively support more data types for the
abs
function. #7568 (jonahgao) - implement string_to_array #7577 (casperhart)
- Create 2023 Q4 roadmap #7551 (graydenshand)
- chore: reduce
physical-plan
dependencies #7599 (crepererum) - Minor: add githubs start/fork buttons to documentation page #7588 (alamb)
- Minor: add more examples for
CREATE EXTERNAL TABLE
doc #7594 (comphead) - Update nix requirement from 0.26.1 to 0.27.1 #7438 (dependabot[bot])
- Update sqllogictest requirement from 0.16.0 to 0.17.0 #7606 (dependabot[bot])
- Fix panic in TopK #7609 (avantgardnerio)
- Move
FileCompressionType
out ofcommon
and intocore
#7596 (haohuaijin) - Expose contents of Constraints #7603 (tv42)
- Change the unbounded_output API default #7605 (metesynnada)
- feat: Parallel collecting parquet files statistics #7573 #7595 (hengfeiyang)
- Support hashing List columns #7616 (jonmmease)
- [MINOR] Make the sink input aware of its plan #7610 (metesynnada)
- [MINOR] Reduce complexity on SHJ #7607 (metesynnada)
- feat: Better large output display in datafusion-cli with --maxrows option #7617 (2010YOUY01)
- Minor: add examples for
arrow_cast
andarrow_typeof
to user guide #7615 (alamb) - [MINOR]: Fix stack overflow bug for get field access expr #7623 (mustafasrepo)
- Group By All #7622 (berkaysynnada)
- Implement protobuf serialization for
(Bounded)WindowAggExec
. #7557 (vrongmeal) - Make it possible to compile datafusion-common without default features #7625 (jonmmease)
- Minor: Adding backtrace documentation #7628 (comphead)
- fix(5975/5976): timezone handling for timestamps and
date_trunc
,date_part
anddate_bin
#7614 (wiedld) - Minor: remove unecessary
Arc
s in datetime_expressions #7630 (alamb) - fix: check for precision overflow when parsing float as decimal #7627 (jonahgao)
- Update arrow 47.0.0 in DataFusion #7587 (tustvold)
- Add test crate to compile DataFusion with wasm-pack #7633 (jonmmease)
- Minor: Update documentation of case expression #7646 (ongchi)
- Minor: improve docstrings on
SessionState
#7654 (alamb) - Update example in the DataFrame documentation. #7650 (jsimpson-gro)
- Add HTTP object store example #7602 (pka)
- feat: make parse_float_as_decimal work on negative numbers #7648 (jonahgao)
- Minor: add doc comments to
ExtractEquijoinPredicate
#7658 (alamb) - [MINOR]: Do not add unnecessary hash repartition to the physical plan #7667 (mustafasrepo)
- Minor: add ticket references to parallel parquet writing code #7592 (alamb)
- Minor: Add ticket reference and add test comment #7593 (alamb)
- Support Avro's Enum type and Fixed type #7635 (sarutak)
- Minor: Migrate datafusion-proto tests into it own binary #7668 (ongchi)
- Upgrade apache-avro to 0.16 #7674 (sarutak)
- Move window analysis to the window method #7672 (mustafasrepo)
- Don't add filters to projection in TableScan #7670 (Dandandan)
- Minor: Improve
TableProviderFilterPushDown
docs #7685 (alamb) - FIX: Test timestamp with table #7701 (jayzhan211)
- Fix bug in
SimplifyExpressions
#7699 (Dandandan) - Enhance Enforce Dist capabilities to fix, sub optimal bad plans #7671 (mustafasrepo)
- docs: add section on supports_filters_pushdown #7680 (tshauck)
- Improve cache usage in CI #7678 (sarutak)
- fix: substrait limit when fetch is None #7669 (waynexia)
- minor: revert parsing precedence between Aggr and UDAF #7682 (waynexia)
- Minor: Move hash utils to common #7684 (jayzhan211)
- Update Default Parquet Write Compression #7692 (devinjdangelo)
- Stop using cache for the benchmark job #7706 (sarutak)
- Change rust.yml to run benchmark #7708 (sarutak)
- Extend infer_placeholder_types to support BETWEEN predicates #7703 (andrelmartins)
- Minor: Add comment explaining why verify benchmark results uses release mode #7712 (alamb)
- Support all the codecs supported by Avro #7718 (sarutak)
- Update substrait requirement from 0.14.0 to 0.15.0 #7719 (dependabot[bot])
- fix: coerce text to timestamps with timezones #7720 (mhilton)
- Add LanceDB to the list of Known Users #7716 (alamb)
- Enable avro reading/writing in datafusion-cli #7715 (alamb)
- Document crate feature flags #7713 (alamb)
- Minor: Consolidate UDF tests #7704 (alamb)
- Minor: fix CI failure due to Cargo.lock in datafusioncli #7733 (yjshen)
- MINOR: change file to column index in page_filter trace log #7730 (mapleFU)
- preserve array type / timezone in
date_bin
anddate_trunc
functions #7729 (mhilton) - Remove redundant is_numeric for DataType #7734 (qrilka)
- fix: avro_to_arrow: Handle avro nested nullable struct (union) #7663 (Samrose-Ahmed)
- Rename
SessionContext::with_config_rt
toSessionContext::new_with_config_from_rt
, etc #7631 (alamb) - Rename
bounded_order_preserving_variants
config toprefer_exising_sort
and update docs #7723 (alamb) - Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator #7721 (Dandandan)
- Minor: Improve crate docs #7740 (alamb)
- [MINOR]: Resolve linter errors in the main #7753 (mustafasrepo)
- Minor: Build concat_internal() with ListArray construction instead of ArrayData #7748 (jayzhan211)
- Minor: Add comment on input_schema from AggregateExec #7727 (viirya)
- Fix column name for COUNT(*) set by AggregateStatistics #7757 (qrilka)
- Add documentation about type signatures, and export
TIMEZONE_WILDCARD
#7726 (alamb) - [feat] Support cache ListFiles result cache in session level #7620 (Ted-Jiang)
- Support
SHOW ALL VERBOSE
to show settings description #7735 (comphead)