|  | 
|  | 1 | +<!-- | 
|  | 2 | +Licensed to the Apache Software Foundation (ASF) under one | 
|  | 3 | +or more contributor license agreements.  See the NOTICE file | 
|  | 4 | +distributed with this work for additional information | 
|  | 5 | +regarding copyright ownership.  The ASF licenses this file | 
|  | 6 | +to you under the Apache License, Version 2.0 (the | 
|  | 7 | +"License"); you may not use this file except in compliance | 
|  | 8 | +with the License.  You may obtain a copy of the License at | 
|  | 9 | +
 | 
|  | 10 | +  http://www.apache.org/licenses/LICENSE-2.0 | 
|  | 11 | +
 | 
|  | 12 | +Unless required by applicable law or agreed to in writing, | 
|  | 13 | +software distributed under the License is distributed on an | 
|  | 14 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | 
|  | 15 | +KIND, either express or implied.  See the License for the | 
|  | 16 | +specific language governing permissions and limitations | 
|  | 17 | +under the License. | 
|  | 18 | +--> | 
|  | 19 | + | 
|  | 20 | +# DataFusion Comet 0.4.0 Changelog | 
|  | 21 | + | 
|  | 22 | +This release consists of 51 commits from 10 contributors. See credits at the end of this changelog for more information. | 
|  | 23 | + | 
|  | 24 | +**Fixed bugs:** | 
|  | 25 | + | 
|  | 26 | +- fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch [#972](https://github.com/apache/datafusion-comet/pull/972) (viirya) | 
|  | 27 | +- fix: The spilled_bytes metric of CometSortExec should be size instead of time [#984](https://github.com/apache/datafusion-comet/pull/984) (Kontinuation) | 
|  | 28 | +- fix: Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path [#982](https://github.com/apache/datafusion-comet/pull/982) (Kontinuation) | 
|  | 29 | +- fix: Fallback to Spark if scan has meta columns [#997](https://github.com/apache/datafusion-comet/pull/997) (viirya) | 
|  | 30 | +- fix: Fallback to Spark if named_struct contains duplicate field names [#1016](https://github.com/apache/datafusion-comet/pull/1016) (viirya) | 
|  | 31 | +- fix: Make comet-git-info.properties optional [#1027](https://github.com/apache/datafusion-comet/pull/1027) (andygrove) | 
|  | 32 | +- fix: TopK operator should return correct results on dictionary column with nulls [#1033](https://github.com/apache/datafusion-comet/pull/1033) (viirya) | 
|  | 33 | +- fix: need default value for getSizeAsMb(EXECUTOR_MEMORY.key) [#1046](https://github.com/apache/datafusion-comet/pull/1046) (neyama) | 
|  | 34 | + | 
|  | 35 | +**Performance related:** | 
|  | 36 | + | 
|  | 37 | +- perf: Remove one redundant CopyExec for SMJ [#962](https://github.com/apache/datafusion-comet/pull/962) (andygrove) | 
|  | 38 | +- perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin [#1007](https://github.com/apache/datafusion-comet/pull/1007) (andygrove) | 
|  | 39 | +- perf: Cache jstrings during metrics collection [#1029](https://github.com/apache/datafusion-comet/pull/1029) (mbutrovich) | 
|  | 40 | + | 
|  | 41 | +**Implemented enhancements:** | 
|  | 42 | + | 
|  | 43 | +- feat: Support `GetArrayStructFields` expression [#993](https://github.com/apache/datafusion-comet/pull/993) (Kimahriman) | 
|  | 44 | +- feat: Implement bloom_filter_agg [#987](https://github.com/apache/datafusion-comet/pull/987) (mbutrovich) | 
|  | 45 | +- feat: Support more types with BloomFilterAgg [#1039](https://github.com/apache/datafusion-comet/pull/1039) (mbutrovich) | 
|  | 46 | +- feat: Implement CAST from struct to string [#1066](https://github.com/apache/datafusion-comet/pull/1066) (andygrove) | 
|  | 47 | +- feat: Use official DataFusion 43 release [#1070](https://github.com/apache/datafusion-comet/pull/1070) (andygrove) | 
|  | 48 | +- feat: Implement CAST between struct types [#1074](https://github.com/apache/datafusion-comet/pull/1074) (andygrove) | 
|  | 49 | +- feat: support array_append [#1072](https://github.com/apache/datafusion-comet/pull/1072) (NoeB) | 
|  | 50 | +- feat: Require offHeap memory to be enabled (always use unified memory) [#1062](https://github.com/apache/datafusion-comet/pull/1062) (andygrove) | 
|  | 51 | + | 
|  | 52 | +**Documentation updates:** | 
|  | 53 | + | 
|  | 54 | +- doc: add documentation interlinks [#975](https://github.com/apache/datafusion-comet/pull/975) (comphead) | 
|  | 55 | +- docs: Add IntelliJ documentation for generated source code [#985](https://github.com/apache/datafusion-comet/pull/985) (mbutrovich) | 
|  | 56 | +- docs: Update tuning guide [#995](https://github.com/apache/datafusion-comet/pull/995) (andygrove) | 
|  | 57 | +- docs: Various documentation improvements [#1005](https://github.com/apache/datafusion-comet/pull/1005) (andygrove) | 
|  | 58 | +- docs: clarify that Maven central only has jars for Linux [#1009](https://github.com/apache/datafusion-comet/pull/1009) (andygrove) | 
|  | 59 | +- doc: fix K8s links and doc [#1058](https://github.com/apache/datafusion-comet/pull/1058) (comphead) | 
|  | 60 | +- docs: Update benchmarking.md [#1085](https://github.com/apache/datafusion-comet/pull/1085) (rluvaton-flarion) | 
|  | 61 | + | 
|  | 62 | +**Other:** | 
|  | 63 | + | 
|  | 64 | +- chore: Generate changelog for 0.3.0 release [#964](https://github.com/apache/datafusion-comet/pull/964) (andygrove) | 
|  | 65 | +- chore: fix publish-to-maven script [#966](https://github.com/apache/datafusion-comet/pull/966) (andygrove) | 
|  | 66 | +- chore: Update benchmarks results based on 0.3.0-rc1 [#969](https://github.com/apache/datafusion-comet/pull/969) (andygrove) | 
|  | 67 | +- chore: update rem expression guide [#976](https://github.com/apache/datafusion-comet/pull/976) (kazuyukitanimura) | 
|  | 68 | +- chore: Enable additional CreateArray tests [#928](https://github.com/apache/datafusion-comet/pull/928) (Kimahriman) | 
|  | 69 | +- chore: fix compatibility guide [#978](https://github.com/apache/datafusion-comet/pull/978) (kazuyukitanimura) | 
|  | 70 | +- chore: Update for 0.3.0 release, prepare for 0.4.0 development [#970](https://github.com/apache/datafusion-comet/pull/970) (andygrove) | 
|  | 71 | +- chore: Don't transform the HashAggregate to CometHashAggregate if Comet shuffle is disabled [#991](https://github.com/apache/datafusion-comet/pull/991) (viirya) | 
|  | 72 | +- chore: Make parquet reader options Comet options instead of Hadoop options [#968](https://github.com/apache/datafusion-comet/pull/968) (parthchandra) | 
|  | 73 | +- chore: remove legacy comet-spark-shell [#1013](https://github.com/apache/datafusion-comet/pull/1013) (andygrove) | 
|  | 74 | +- chore: Reserve memory for native shuffle writer per partition [#988](https://github.com/apache/datafusion-comet/pull/988) (viirya) | 
|  | 75 | +- chore: Bump arrow-rs to 53.1.0 and datafusion [#1001](https://github.com/apache/datafusion-comet/pull/1001) (kazuyukitanimura) | 
|  | 76 | +- chore: Revert "chore: Reserve memory for native shuffle writer per partition (#988)" [#1020](https://github.com/apache/datafusion-comet/pull/1020) (viirya) | 
|  | 77 | +- minor: Remove hard-coded version number from Dockerfile [#1025](https://github.com/apache/datafusion-comet/pull/1025) (andygrove) | 
|  | 78 | +- chore: Reserve memory for native shuffle writer per partition [#1022](https://github.com/apache/datafusion-comet/pull/1022) (viirya) | 
|  | 79 | +- chore: Improve error handling when native lib fails to load [#1000](https://github.com/apache/datafusion-comet/pull/1000) (andygrove) | 
|  | 80 | +- chore: Use twox-hash 2.0 xxhash64 oneshot api instead of custom implementation [#1041](https://github.com/apache/datafusion-comet/pull/1041) (NoeB) | 
|  | 81 | +- chore: Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader [#1047](https://github.com/apache/datafusion-comet/pull/1047) (viirya) | 
|  | 82 | +- minor: Refactor binary expr serde to reduce code duplication [#1053](https://github.com/apache/datafusion-comet/pull/1053) (andygrove) | 
|  | 83 | +- chore: Upgrade to DataFusion 43.0.0-rc1 [#1057](https://github.com/apache/datafusion-comet/pull/1057) (andygrove) | 
|  | 84 | +- chore: Refactor UnaryExpr and MathExpr in protobuf [#1056](https://github.com/apache/datafusion-comet/pull/1056) (andygrove) | 
|  | 85 | +- minor: use defaults instead of hard-coding values [#1060](https://github.com/apache/datafusion-comet/pull/1060) (andygrove) | 
|  | 86 | +- minor: refactor UnaryExpr handling to make code more concise [#1065](https://github.com/apache/datafusion-comet/pull/1065) (andygrove) | 
|  | 87 | +- chore: Refactor binary and math expression serde code [#1069](https://github.com/apache/datafusion-comet/pull/1069) (andygrove) | 
|  | 88 | +- chore: Simplify CometShuffleMemoryAllocator to use Spark unified memory allocator [#1063](https://github.com/apache/datafusion-comet/pull/1063) (viirya) | 
|  | 89 | +- test: Restore one test in CometExecSuite by adding COMET_SHUFFLE_MODE config [#1087](https://github.com/apache/datafusion-comet/pull/1087) (viirya) | 
|  | 90 | + | 
|  | 91 | +## Credits | 
|  | 92 | + | 
|  | 93 | +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. | 
|  | 94 | + | 
|  | 95 | +``` | 
|  | 96 | +    19	Andy Grove | 
|  | 97 | +    13	Matt Butrovich | 
|  | 98 | +     8	Liang-Chi Hsieh | 
|  | 99 | +     3	KAZUYUKI TANIMURA | 
|  | 100 | +     2	Adam Binford | 
|  | 101 | +     2	Kristin Cowalcijk | 
|  | 102 | +     1	NoeB | 
|  | 103 | +     1	Oleks V | 
|  | 104 | +     1	Parth Chandra | 
|  | 105 | +     1	neyama | 
|  | 106 | +``` | 
|  | 107 | + | 
|  | 108 | +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. | 
0 commit comments