Skip to content

Preparing v4.0.1 #684

@richox

Description

@richox

planning to release v4.0.1:

New Feature

  • Initial supports to ORC input file format.
  • Initial supports to RSS framework and Apache Celeborn shuffle service.

Improvement

  • Optimize AggExec by supporting Implement columnar-based aggregation.
  • Use custom implemented hashmap implement for aggregation.
  • Supports specialized count(0).
  • Optimize bloom filter by reusing same bloom filter in the same executor.
  • Optimize bloom filter by supporting shrinking.
  • Optimize reading parquet files by supporting parallel reading.
  • Improve spill file deletion logics.

Bug fixes

  • Fix file not found for path with url encoded character.
  • Fix Hashaggregate convert job throwing ScalaReflectionException.
  • Fix pruning error while reading parquet files with multiple row groups.
  • Fix incorrect number of tasks due to missing shuffleOrigin.
  • Fix record batch creating error when hash joining with empty input.

Other

  • Upgrade datafusion/arrow dependency to v42/v53.
  • Replace gxhash with foldhash for better compatibility on some hardwares.
  • Other minor improvement & fixes.

PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions