Skip to content

Commit 2ffd73a

Browse files
Casey Chingishaan
Casey Ching
authored andcommitted
Parquet: Fix value def level when max def level is 0
When running with a release build, NULL would be returned when reading values from required fields in parquet files (with a debug build a DCHECK would be hit). Previously when the max definition level for a field was 0 (which happens if a field is required), the definition level for value was incorrectly set to 1. The max definition level is related to nested data and is defined to be the number of nullable fields that will be encountered when traversing a path to reach the desired end field. For example, if a nested schema has a path a.b.c.d where b and d are nullable then the max def level is 2. A def level is attached to each value to indicate the number of optional values that are present (in the previous example an def level of 2 means both b and d are not null). So having a def level for a value that is greater than the max def level for a field should never happen. Change-Id: Ia91a97cf79e672c420d10416c6817f0930dcc920
1 parent 74fc5a2 commit 2ffd73a

File tree

5 files changed

+34
-1
lines changed

5 files changed

+34
-1
lines changed

be/src/exec/hdfs-parquet-scanner.cc

+1-1
Original file line numberDiff line numberDiff line change
@@ -719,7 +719,7 @@ inline int HdfsParquetScanner::BaseColumnReader::ReadDefinitionLevel() {
719719
if (max_def_level() == 0) {
720720
// This column and any containing structs are required so there is nothing encoded for
721721
// the definition levels.
722-
return 1;
722+
return 0;
723723
}
724724

725725
uint8_t definition_level;
1.83 KB
Binary file not shown.

testdata/datasets/functional/functional_schema_template.sql

+22
Original file line numberDiff line numberDiff line change
@@ -1338,6 +1338,28 @@ ${IMPALA_HOME}/testdata/data/bad_compressed_size.parquet \
13381338
/test-warehouse/bad_compressed_size_parquet/
13391339
====
13401340
---- DATASET
1341+
-- Parquet file with required columns written by Kite. Hive and Impala always write files
1342+
-- with fields as optional.
1343+
functional
1344+
---- BASE_TABLE_NAME
1345+
kite_required_fields
1346+
---- COLUMNS
1347+
req_int bigint
1348+
opt_int bigint
1349+
req_string string
1350+
opt_string string
1351+
req_bool boolean
1352+
opt_bool boolean
1353+
opt_int_2 bigint
1354+
opt_int_3 bigint
1355+
req_int_2 bigint
1356+
req_int_3 bigint
1357+
---- LOAD
1358+
`hadoop fs -mkdir -p /test-warehouse/kite_required_fields_parquet && \
1359+
hadoop fs -put -f ${IMPALA_HOME}/testdata/data/kite_required_fields.parquet \
1360+
/test-warehouse/kite_required_fields_parquet/
1361+
====
1362+
---- DATASET
13411363
functional
13421364
---- BASE_TABLE_NAME
13431365
bad_serde

testdata/datasets/functional/schema_constraints.csv

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ table_name:bad_metadata_len, constraint:restrict_to, table_format:parquet/none/n
4040
table_name:bad_dict_page_offset, constraint:restrict_to, table_format:parquet/none/none
4141
table_name:bad_compressed_size, constraint:restrict_to, table_format:parquet/none/none
4242
table_name:alltypesagg_hive_13_1, constraint:restrict_to, table_format:parquet/none/none
43+
table_name:kite_required_fields, constraint:restrict_to, table_format:parquet/none/none
4344

4445
# TODO: Support Avro. Data loading currently fails for Avro because complex types
4546
# cannot be converted to the corresponding Avro types yet.

testdata/workloads/functional-query/queries/QueryTest/parquet.test

+10
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,13 @@ SELECT * from bad_compressed_size
3636
---- CATCH
3737
Column 0 has invalid column offsets (offset=4, size=1000000, file_size=245)
3838
====
39+
---- QUERY
40+
# Parquet file with required fields.
41+
select * from kite_required_fields
42+
---- TYPES
43+
bigint,bigint,string,string,boolean,boolean,bigint,bigint,bigint,bigint
44+
---- RESULTS
45+
1,2,'foo','bar',true,false,1,2,3,4
46+
1,NULL,'foo','NULL',true,NULL,NULL,NULL,3,4
47+
100,NULL,'foooo','NULL',false,NULL,NULL,NULL,300,400
48+
====

0 commit comments

Comments
 (0)