Skip to content

Commit 3eb55e9

Browse files
author
Brent Gardner
authored
Upgrade to arrow 20.0.0 (but no change to object_store), including prost, and tonic (apache#3083)
* Upgrade arrow fix decimal (#4) Fix human error Patch crates io to fix build (#5) * fix decimal * patch crate versions Patch objectstore Test in CI Undo override? Fix more errors Fix last error? Formatting Clippy Fixes Fix refs Able to get session context, but JDBC driver hung Upgrade to arrow 20 Upgrade to RC2 Formatting Fix some imports Install protoc Try platform agnostic path Debug in CI :( Debug in CI :( Debug in CI :( Not worth it, just separate builds Variables Fixes Fix windows? Fix windows? Hackily fix windows Down to 1 failure Fix protoc All? tests pass Formatting * Fix remaining tests * Clippy * Update docs for Windows * Try with old objectstore * Revert path "fixes" that broke windows * Update to arrow 20
1 parent 1e44417 commit 3eb55e9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+527
-416
lines changed

.github/workflows/rust.yml

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,16 @@ jobs:
8282
- uses: actions/checkout@v2
8383
with:
8484
submodules: true
85+
- name: Install protobuf compiler
86+
shell: bash
87+
run: |
88+
mkdir -p $HOME/d/protoc
89+
cd $HOME/d/protoc
90+
export PROTO_ZIP="protoc-21.4-linux-x86_64.zip"
91+
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
92+
unzip $PROTO_ZIP
93+
export PATH=$PATH:$HOME/d/protoc/bin
94+
protoc --version
8595
- name: Cache Cargo
8696
uses: actions/cache@v3
8797
with:
@@ -94,6 +104,7 @@ jobs:
94104
rust-version: ${{ matrix.rust }}
95105
- name: Run tests
96106
run: |
107+
export PATH=$PATH:$HOME/d/protoc/bin
97108
cargo test --features avro,jit,scheduler,json
98109
# test datafusion-sql examples
99110
cargo run --example sql
@@ -159,17 +170,65 @@ jobs:
159170
POSTGRES_USER: postgres
160171
POSTGRES_PASSWORD: postgres
161172

162-
windows-and-macos:
163-
name: Test on ${{ matrix.os }} Rust ${{ matrix.rust }}
173+
windows:
174+
name: Test on Windows Rust ${{ matrix.rust }}
164175
runs-on: ${{ matrix.os }}
165176
strategy:
166177
matrix:
167-
os: [windows-latest, macos-latest]
178+
os: [windows-latest]
168179
rust: [stable]
169180
steps:
170181
- uses: actions/checkout@v2
171182
with:
172183
submodules: true
184+
- name: Install protobuf compiler
185+
shell: bash
186+
run: |
187+
mkdir -p $HOME/d/protoc
188+
cd $HOME/d/protoc
189+
export PROTO_ZIP="protoc-21.4-win64.zip"
190+
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
191+
unzip $PROTO_ZIP
192+
export PATH=$PATH:$HOME/d/protoc/bin
193+
protoc.exe --version
194+
# TODO: this won't cache anything, which is expensive. Setup this action
195+
# with a OS-dependent path.
196+
- name: Setup Rust toolchain
197+
run: |
198+
rustup toolchain install ${{ matrix.rust }}
199+
rustup default ${{ matrix.rust }}
200+
rustup component add rustfmt
201+
- name: Run tests
202+
shell: bash
203+
run: |
204+
export PATH=$PATH:$HOME/d/protoc/bin
205+
cargo test
206+
env:
207+
# do not produce debug symbols to keep memory usage down
208+
RUSTFLAGS: "-C debuginfo=0"
209+
210+
macos:
211+
name: Test on MacOS Rust ${{ matrix.rust }}
212+
runs-on: ${{ matrix.os }}
213+
strategy:
214+
matrix:
215+
os: [macos-latest]
216+
rust: [stable]
217+
steps:
218+
- uses: actions/checkout@v2
219+
with:
220+
submodules: true
221+
- name: Install protobuf compiler
222+
shell: bash
223+
run: |
224+
mkdir -p $HOME/d/protoc
225+
cd $HOME/d/protoc
226+
export PROTO_ZIP="protoc-21.4-osx-x86_64.zip"
227+
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
228+
unzip $PROTO_ZIP
229+
echo "$HOME/d/protoc/bin" >> $GITHUB_PATH
230+
export PATH=$PATH:$HOME/d/protoc/bin
231+
protoc --version
173232
# TODO: this won't cache anything, which is expensive. Setup this action
174233
# with a OS-dependent path.
175234
- name: Setup Rust toolchain
@@ -250,6 +309,16 @@ jobs:
250309
- uses: actions/checkout@v2
251310
with:
252311
submodules: true
312+
- name: Install protobuf compiler
313+
shell: bash
314+
run: |
315+
mkdir -p $HOME/d/protoc
316+
cd $HOME/d/protoc
317+
export PROTO_ZIP="protoc-21.4-linux-x86_64.zip"
318+
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v21.4/$PROTO_ZIP
319+
unzip $PROTO_ZIP
320+
export PATH=$PATH:$HOME/d/protoc/bin
321+
protoc --version
253322
- name: Setup Rust toolchain
254323
run: |
255324
rustup toolchain install ${{ matrix.rust }}
@@ -263,6 +332,7 @@ jobs:
263332
key: cargo-coverage-cache3-
264333
- name: Run coverage
265334
run: |
335+
export PATH=$PATH:$HOME/d/protoc/bin
266336
rustup toolchain install stable
267337
rustup default stable
268338
cargo install --version 0.20.1 cargo-tarpaulin

CONTRIBUTING.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,15 @@ list to help you get started.
3535

3636
This section describes how you can get started at developing DataFusion.
3737

38+
### Windows setup
39+
40+
```shell
41+
wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip
42+
choco install -y git rustup.install visualcpp-build-tools
43+
git-bash.exe
44+
cargo build
45+
```
46+
3847
### Bootstrap environment
3948

4049
DataFusion is written in Rust and it uses a standard rust toolkit:

datafusion-cli/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ rust-version = "1.59"
2929
readme = "README.md"
3030

3131
[dependencies]
32-
arrow = { version = "19.0.0" }
32+
arrow = { version = "20.0.0" }
3333
clap = { version = "3", features = ["derive", "cargo"] }
3434
datafusion = { path = "../datafusion/core", version = "10.0.0" }
3535
dirs = "4.0.0"

datafusion-examples/Cargo.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,13 @@ path = "examples/avro_sql.rs"
3434
required-features = ["datafusion/avro"]
3535

3636
[dev-dependencies]
37-
arrow-flight = { version = "19.0.0" }
37+
arrow-flight = { version = "20.0.0" }
3838
async-trait = "0.1.41"
3939
datafusion = { path = "../datafusion/core" }
4040
futures = "0.3"
4141
num_cpus = "1.13.0"
42-
prost = "0.10"
42+
prost = "0.11.0"
4343
serde = { version = "1.0.136", features = ["derive"] }
4444
serde_json = "1.0.82"
4545
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync", "parking_lot"] }
46-
tonic = "0.7"
46+
tonic = "0.8"

datafusion/common/Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,12 @@ pyarrow = ["pyo3"]
3939

4040
[dependencies]
4141
apache-avro = { version = "0.14", features = ["snappy"], optional = true }
42-
arrow = { version = "19.0.0", features = ["prettyprint"] }
42+
arrow = { version = "20.0.0", features = ["prettyprint"] }
43+
avro-rs = { version = "0.13", features = ["snappy"], optional = true }
4344
cranelift-module = { version = "0.86.1", optional = true }
4445
object_store = { version = "0.3", optional = true }
4546
ordered-float = "3.0"
46-
parquet = { version = "19.0.0", features = ["arrow"], optional = true }
47+
parquet = { version = "20.0.0", features = ["arrow"], optional = true }
4748
pyo3 = { version = "0.16", optional = true }
4849
serde_json = "1.0"
4950
sqlparser = "0.20"

datafusion/common/src/from_slice.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ where
6969
offsets.push(length_so_far);
7070
values.extend_from_slice(s);
7171
}
72-
let array_data = ArrayData::builder(Self::get_data_type())
72+
let array_data = ArrayData::builder(Self::DATA_TYPE)
7373
.len(slice.len())
7474
.add_buffer(Buffer::from_slice_ref(&offsets))
7575
.add_buffer(Buffer::from_slice_ref(&values));

datafusion/common/src/scalar.rs

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ use arrow::{
2727
IntervalMonthDayNanoType, IntervalUnit, IntervalYearMonthType, TimeUnit,
2828
TimestampMicrosecondType, TimestampMillisecondType, TimestampNanosecondType,
2929
TimestampSecondType, UInt16Type, UInt32Type, UInt64Type, UInt8Type,
30-
DECIMAL_MAX_PRECISION,
30+
DECIMAL128_MAX_PRECISION,
3131
},
3232
util::decimal::{BasicDecimal, Decimal128},
3333
};
@@ -611,7 +611,7 @@ impl ScalarValue {
611611
scale: usize,
612612
) -> Result<Self> {
613613
// make sure the precision and scale is valid
614-
if precision <= DECIMAL_MAX_PRECISION && scale <= precision {
614+
if precision <= DECIMAL128_MAX_PRECISION && scale <= precision {
615615
return Ok(ScalarValue::Decimal128(Some(value), precision, scale));
616616
}
617617
Err(DataFusionError::Internal(format!(
@@ -654,7 +654,7 @@ impl ScalarValue {
654654
ScalarValue::Int32(_) => DataType::Int32,
655655
ScalarValue::Int64(_) => DataType::Int64,
656656
ScalarValue::Decimal128(_, precision, scale) => {
657-
DataType::Decimal(*precision, *scale)
657+
DataType::Decimal128(*precision, *scale)
658658
}
659659
ScalarValue::TimestampSecond(_, tz_opt) => {
660660
DataType::Timestamp(TimeUnit::Second, tz_opt.clone())
@@ -935,7 +935,7 @@ impl ScalarValue {
935935
}
936936

937937
let array: ArrayRef = match &data_type {
938-
DataType::Decimal(precision, scale) => {
938+
DataType::Decimal128(precision, scale) => {
939939
let decimal_array =
940940
ScalarValue::iter_to_decimal_array(scalars, precision, scale)?;
941941
Arc::new(decimal_array)
@@ -1448,7 +1448,7 @@ impl ScalarValue {
14481448

14491449
Ok(match array.data_type() {
14501450
DataType::Null => ScalarValue::Null,
1451-
DataType::Decimal(precision, scale) => {
1451+
DataType::Decimal128(precision, scale) => {
14521452
ScalarValue::get_decimal_value_from_array(array, index, precision, scale)
14531453
}
14541454
DataType::Boolean => typed_cast!(array, index, BooleanArray, Boolean),
@@ -1899,7 +1899,7 @@ impl TryFrom<&DataType> for ScalarValue {
18991899
DataType::UInt16 => ScalarValue::UInt16(None),
19001900
DataType::UInt32 => ScalarValue::UInt32(None),
19011901
DataType::UInt64 => ScalarValue::UInt64(None),
1902-
DataType::Decimal(precision, scale) => {
1902+
DataType::Decimal128(precision, scale) => {
19031903
ScalarValue::Decimal128(None, *precision, *scale)
19041904
}
19051905
DataType::Utf8 => ScalarValue::Utf8(None),
@@ -2145,7 +2145,7 @@ mod tests {
21452145
#[test]
21462146
fn scalar_decimal_test() {
21472147
let decimal_value = ScalarValue::Decimal128(Some(123), 10, 1);
2148-
assert_eq!(DataType::Decimal(10, 1), decimal_value.get_datatype());
2148+
assert_eq!(DataType::Decimal128(10, 1), decimal_value.get_datatype());
21492149
let try_into_value: i128 = decimal_value.clone().try_into().unwrap();
21502150
assert_eq!(123_i128, try_into_value);
21512151
assert!(!decimal_value.is_null());
@@ -2163,14 +2163,14 @@ mod tests {
21632163
let array = decimal_value.to_array();
21642164
let array = array.as_any().downcast_ref::<Decimal128Array>().unwrap();
21652165
assert_eq!(1, array.len());
2166-
assert_eq!(DataType::Decimal(10, 1), array.data_type().clone());
2166+
assert_eq!(DataType::Decimal128(10, 1), array.data_type().clone());
21672167
assert_eq!(123i128, array.value(0).as_i128());
21682168

21692169
// decimal scalar to array with size
21702170
let array = decimal_value.to_array_of_size(10);
21712171
let array_decimal = array.as_any().downcast_ref::<Decimal128Array>().unwrap();
21722172
assert_eq!(10, array.len());
2173-
assert_eq!(DataType::Decimal(10, 1), array.data_type().clone());
2173+
assert_eq!(DataType::Decimal128(10, 1), array.data_type().clone());
21742174
assert_eq!(123i128, array_decimal.value(0).as_i128());
21752175
assert_eq!(123i128, array_decimal.value(9).as_i128());
21762176
// test eq array
@@ -2208,7 +2208,7 @@ mod tests {
22082208
// convert the vec to decimal array and check the result
22092209
let array = ScalarValue::iter_to_array(decimal_vec.into_iter()).unwrap();
22102210
assert_eq!(3, array.len());
2211-
assert_eq!(DataType::Decimal(10, 2), array.data_type().clone());
2211+
assert_eq!(DataType::Decimal128(10, 2), array.data_type().clone());
22122212

22132213
let decimal_vec = vec![
22142214
ScalarValue::Decimal128(Some(1), 10, 2),
@@ -2218,7 +2218,7 @@ mod tests {
22182218
];
22192219
let array = ScalarValue::iter_to_array(decimal_vec.into_iter()).unwrap();
22202220
assert_eq!(4, array.len());
2221-
assert_eq!(DataType::Decimal(10, 2), array.data_type().clone());
2221+
assert_eq!(DataType::Decimal128(10, 2), array.data_type().clone());
22222222

22232223
assert!(ScalarValue::try_new_decimal128(1, 10, 2)
22242224
.unwrap()

datafusion/core/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ unicode_expressions = ["datafusion-physical-expr/regex_expressions", "datafusion
5656
[dependencies]
5757
ahash = { version = "0.7", default-features = false }
5858
apache-avro = { version = "0.14", optional = true }
59-
arrow = { version = "19.0.0", features = ["prettyprint"] }
59+
arrow = { version = "20.0.0", features = ["prettyprint"] }
6060
async-trait = "0.1.41"
6161
bytes = "1.1"
6262
chrono = { version = "0.4", default-features = false }
@@ -78,7 +78,7 @@ num_cpus = "1.13.0"
7878
object_store = "0.3.0"
7979
ordered-float = "3.0"
8080
parking_lot = "0.12"
81-
parquet = { version = "19.0.0", features = ["arrow", "async"] }
81+
parquet = { version = "20.0.0", features = ["arrow", "async"] }
8282
paste = "^1.0"
8383
pin-project-lite = "^0.2.7"
8484
pyo3 = { version = "0.16", optional = true }

datafusion/core/fuzz-utils/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,6 @@ edition = "2021"
2323
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
2424

2525
[dependencies]
26-
arrow = { version = "19.0.0", features = ["prettyprint"] }
26+
arrow = { version = "20.0.0", features = ["prettyprint"] }
2727
env_logger = "0.9.0"
2828
rand = "0.8"

datafusion/core/src/avro_to_arrow/arrow_array_reader.rs

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -101,12 +101,10 @@ impl<'a, R: Read> AvroArrowArrayReader<'a, R> {
101101
"Failed to parse avro value: {:?}",
102102
e
103103
))),
104-
other => {
105-
return Err(ArrowError::ParseError(format!(
106-
"Row needs to be of type object, got: {:?}",
107-
other
108-
)))
109-
}
104+
other => Err(ArrowError::ParseError(format!(
105+
"Row needs to be of type object, got: {:?}",
106+
other
107+
))),
110108
})
111109
.collect::<ArrowResult<Vec<Vec<(String, Value)>>>>()?;
112110
if rows.is_empty() {

0 commit comments

Comments
 (0)