Duplicated project schema will cause index out of bounds exception in orc_exec

**Describe the bug**
A clear and concise description of what the bug is.

Table: 
CREATE TABLE `test_orc`(
  `id` BIGINT COMMENT 'pk', 
  `m` MAP<STRING,STRING> COMMENT 'test read map type', 
  `l` ARRAY<STRING> COMMENT 'test read list type', 
  `s` STRING COMMENT 'string type'
) using orc

Sql statement:
`select l from test_orc`

Execute this sql will get execption as below:
`24/12/26 15:08:13 INFO BlazeCallNativeWrapper: Start executing native plan
(+398.133s) [INFO] (stage: 5, partition: 0) - start executing plan:
ProjectExec [cast(#2@0 AS Utf8) AS #65], schema=[#65:Utf8;N]
  RenameColumnsExec: ["#2"], schema=[#2:List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} });N]
    OrcExec: file_group=[PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "ZmlsZTovLy9Vc2Vycy9zaDAwNzA0bWwvRG93bmxvYWRzL2Nvcy9wYXJ0LTAwMDAwLTFiNzE4YzI4LWFlYjgtNDM2My04NjFkLTg1YmUwNTlkYTM1MC1jMDAwLnNuYXBweS5vcmM" }, last_modified: 1970-01-01T00:00:00Z, size: 804, e_tag: None, version: None }, partition_values: [], range: Some(FileRange { start: 0, end: 804 }), statistics: None, extensions: None }], limit=None, projection=Some([2]), schema=[l:List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} });N]

thread 'blaze-native-stage-5-part-0' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:285:21:
output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
thread 'blaze-native-stage-5-part-0' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:308:21:
output_with_sender[OrcScan] error: Execution error: output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
thread 'blaze-native-stage-5-part-0' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:308:21:
output_with_sender[Project] error: Execution error: output_with_sender[OrcScan] error: Execution error: output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
(+398.215s) [ERROR] (stage: 5, partition: 0) - native execution panics: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[OrcScan] error: Execution error: output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
(+398.215s) [INFO] (stage: 5, partition: 0) - task exited abnormally.
(+398.218s) [INFO] (stage: 0, partition: 0) - (partition=0) native execution finalizing
(+398.227s) [INFO] (stage: 0, partition: 0) - (partition=0) native execution finalized
24/12/26 15:08:13 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5)
java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[OrcScan] error: Execution error: output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
	at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
	at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
24/12/26 15:08:13 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5) (192.168.132.23 executor driver): java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[OrcScan] error: Execution error: output_with_sender[OrcScan]: output() returns error: Arrow error: Schema error: project index 2 out of bounds, max field 1
	at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
	at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)`

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Smartphone (please complete the following information):**
 - Device: [e.g. iPhone6]
 - OS: [e.g. iOS8.1]
 - Browser [e.g. stock browser, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated project schema will cause index out of bounds exception in orc_exec #722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Duplicated project schema will cause index out of bounds exception in orc_exec #722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions