Skip to content

Cast string to integral type not compatible with Spark #15

@andygrove

Description

@andygrove
scala> val inputs = Seq("123", "-123", "86374").toDF("n")
inputs: org.apache.spark.sql.DataFrame = [n: string]

scala> inputs.write.parquet("test.parquet")
24/02/13 07:40:14 INFO src/lib.rs: Comet native library initialized
                                                                                
scala> val df = spark.read.parquet("test.parquet")
df: org.apache.spark.sql.DataFrame = [n: string]

scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> val df2 = df.withColumn("converted", col("n").cast(DataTypes.ShortType))
df2: org.apache.spark.sql.DataFrame = [n: string, converted: smallint]

scala> df2.show
+-----+---------+
|    n|converted|
+-----+---------+
|86374|    20838|
| -123|     -123|
|  123|      123|
+-----+---------+

scala> spark.conf.set("spark.comet.enabled", false)

scala> df2.show
+-----+---------+
|    n|converted|
+-----+---------+
|86374|     null|
| -123|     -123|
|  123|      123|
+-----+---------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions