Reading records inserted using Athena throws UOE exception when read using Spark (AWS)

### Apache Iceberg version

0.14.0 (latest release)

### Query engine

Spark

### Please describe the bug 🐞

**Description**

1. An Iceberg table has been created and populated in a Spark job (using GlueCatalog)
2. Additional data has been inserted to the table using Athena console or API
3. When I attempt to read new data from the table using Spark  the following exception is thrown 

```
22/08/20 22:16:05 ERROR BaseDataReader: Error reading file: s3://****/temp-testing-aa/default.db/iceberg_demo3/data/category=CAT5/d53b94be-8d2d-4892-b020-b60610a30318.gz.parquet
java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [email] optional binary email (STRING) = 1 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file
	at org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator.initDataReader(VectorizedPageIterator.java:96)

```
This does not happen when the same data is inserted using Spark

### **Reproduce**: 

#### 1. Run this spark code to create table and insert some data

```
    val spark  = SparkSession.builder()
      .appName("DataExtractor")
      .config("spark.master", "local[*]")
      .config("spark.driver.host", "127.0.0.1")
      .config("spark.driver.bindAddress", "127.0.0.1")

      .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
      .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog")
      .config("spark.sql.catalog.my_catalog.warehouse", "s3://*****/temp-testing-aa")
      .config ("spark.sql.catalog.my_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
      .config("spark.sql.catalog.my_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
      .getOrCreate()

    val icebergDDL =
      """
        |CREATE TABLE IF NOT EXISTS my_catalog.default.iceberg_demo (
        |    email string,
        |    data string,
        |    category string)
        |USING iceberg
        |PARTITIONED BY (category);
        |""".stripMargin

    spark.sql(icebergDDL)

    val icebergInsert =
      """
        |INSERT INTO my_catalog.default.iceberg_demo (email, data, category)
        |VALUES
        |('AABB1', 'data 1' , 'CAT1'),
        |('AABB2', 'data 2' , 'CAT1'),
        |('AABB3', 'data 3' , 'CAT2'),
        |('AABB4', 'data 4' , 'CAT3')

        |""".stripMargin

    spark.sql(icebergInsert)
    val df = spark.table("my_catalog.default.iceberg_demo")
    df.show()
```

#### Table has been created and viewable from Athena and Spark. 

#### 2. Use Athena to insert a record
 ```
INSERT INTO iceberg_demo (email, data, category)  VALUES ('AABB5', 'data 5' , 'CAT5')
```

#### 3. Run a spark job again with the following line 
```
    val df = spark.table("my_catalog.default.iceberg_demo")
    df.show()
```

#### 5. The following exception is thrown

```
22/08/20 22:48:26 INFO CatalogUtil: Loading custom FileIO implementation: org.apache.iceberg.aws.s3.S3FileIO
22/08/20 22:48:27 INFO BaseMetastoreTableOperations: Refreshing table metadata from new version: s3://***/temp-testing-aa/default.db/iceberg_demo/metadata/00002-59d72a1f-8b73-4386-bb86-3b6dfc29e858.metadata.json
22/08/20 22:48:28 INFO BaseMetastoreCatalog: Table loaded by catalog: my_catalog.default.iceberg_demo
22/08/20 22:48:28 INFO V2ScanRelationPushDown: 
Output: email#0, data#1, category#2
......
.....
22/08/20 22:48:32 ERROR BaseDataReader: Error reading file: s3://****/temp-testing-aa/default.db/iceberg_demo/data/category=CAT5/ff78aec0-344e-489a-9206-ab977cd9897b.gz.parquet
java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [email] optional binary email (STRING) = 1 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file
	at org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator.initDataReader(VectorizedPageIterator.java:96)
	at org.apache.iceberg.parquet.BasePageIterator.initFromPage(BasePageIterator.java:136)
	at org.apache.iceberg.parquet.BasePageIterator$1.visit(BasePageIterator.java:103)
	at org.apache.iceberg.parquet.BasePageIterator$1.visit(BasePageIterator.java:94)
```


This also happens when I try to use 
config("spark.sql.parquet.enableVectorizedReader", "false")


 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading records inserted using Athena throws UOE exception when read using Spark (AWS) #5593

Apache Iceberg version

Query engine

Please describe the bug 🐞

Reproduce:

1. Run this spark code to create table and insert some data

Table has been created and viewable from Athena and Spark.

2. Use Athena to insert a record

3. Run a spark job again with the following line

5. The following exception is thrown

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading records inserted using Athena throws UOE exception when read using Spark (AWS) #5593

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Reproduce:

1. Run this spark code to create table and insert some data

Table has been created and viewable from Athena and Spark.

2. Use Athena to insert a record

3. Run a spark job again with the following line

5. The following exception is thrown

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions