Skip to content

Query error from HUDI external table when HUDI table recreated #5370

Closed
@tiannan-sr

Description

Steps to reproduce the behavior (Required)

  1. create hudi table and insert data:
create table hudi_tbl_refresh_test(
 uuid int,
 col_boolean boolean,
 col_int int,
 col_long bigint,
 col_float float,
 col_double double,
 col_decimal decimal(38,18),
 col_date date,
 col_timestamp timestamp,
 col_string string,
 col_binary binary,
 col_array array<int>,
 col_struct struct<a:string,b:int>,
 col_map map<string,int>
 ) USING hudi 
 TBLPROPERTIES(
 type = 'cow',
 primarykey = 'uuid')
 partitioned by(col_date,col_int);
 insert into hudi_tbl_refresh_test partition(col_date = '2020-01-01', col_int='1') select 1,true,1.01,1.001,1.0001,10000000000.0000000001,cast('2020-01-01 00:00:01' as timestamp),'Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?',cast('1110001010101011001001' as binary),array(1,10,100),null,null;

2.create external table and select executed normally:

 create external table ex_hudi_refresh_test
 (uuid int,
 col_boolean boolean,
 col_int int,
 col_long bigint,
 col_float float,
 col_double double,
 col_date date,
 col_string string)
 engine = hudi
 properties(
 "resource" = "hudi_emr_tn",
 "database" = "hudi_db",
 "table" = "hudi_tbl_refresh_test"
 );
select * from ex_hudi_refresh_test;

3.drop hudi table:

drop table hudi_tbl_refresh_test;

4.recreate hudi table which table name as same as before, and insert data:

 create table hudi_tbl_refresh_test(
 uuid int,
 col_boolean boolean,
 col_int int,
 col_long bigint,
 col_float float,
 col_double double,
 col_decimal decimal(38,18),
 col_date date,
 col_timestamp timestamp,
 col_string string,
 col_binary binary,
 col_array array<int>,
 col_struct struct<a:string,b:int>,
 col_map map<string,int>
 ) USING hudi 
 TBLPROPERTIES(
 type = 'cow',
 primarykey = 'uuid')
 partitioned by(col_date,col_int);
insert into hudi_tbl_refresh_test partition(col_date = '2020-01-01', col_int='2') select 1,true,1.01,1.001,1.0001,10000000000.0000000001,cast('2020-01-01 00:00:01' as timestamp),'Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?',cast('1110001010101011001001' as binary),array(1,10,100),null,null;

5.drop hudi external table and recreated it with same table name as before:

 drop table ex_hudi_refresh_test;
 create external table ex_hudi_refresh_test
 (uuid int,
 col_boolean boolean,
 col_int int,
 col_long bigint,
 col_float float,
 col_double double,
 col_date date,
 col_string string)
 engine = hudi
 properties(
 "resource" = "hudi_emr_tn",
 "database" = "hudi_db",
 "table" = "hudi_tbl_refresh_test"
 );

6.query error from hudi external table :

mysql> select * from ex_hudi_refresh_test;
ERROR 1064 (HY000): hdfsOpenFile failed, file=hdfs://emr-header-1.cluster-49155:9000/user/hive/warehouse/hudi_db.db/hudi_tbl_refresh_test/col_date=2020-02-01/col_int=3/ab048f31-d0b0-410a-862b-7ae4a63e2f2d-0_0-218-1918_20220421155641387.parquet

Expected behavior (Required)

query return the right result

Real behavior (Required)

query error until restart the cluster

StarRocks version (Required)

  • You can get the StarRocks version by executing SQL select current_version()
    branch-2.2
mysql> select current_version();
+------------------------+
| current_version()      |
+------------------------+
| QA_TEST_MASTER 704419f |
+------------------------+
1 row in set (0.01 sec)

Metadata

Assignees

Labels

type/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions