Skip to content

[Failed] Substring does not support binary input #1724

@xumingming

Description

@xumingming

Describe the bug

Spark's substring support Binary input: https://github.com/apache/spark/blob/dc47def562652b6d35a6ecb6600373ed645326bf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L2323-L2330

  override def nullSafeEval(string: Any, pos: Any, len: Any): Any = {
    str.dataType match {
      case _: StringType => string.asInstanceOf[UTF8String]
        .substringSQL(pos.asInstanceOf[Int], len.asInstanceOf[Int])
      case BinaryType => ByteArray.subStringSQL(string.asInstanceOf[Array[Byte]],
        pos.asInstanceOf[Int], len.asInstanceOf[Int])
    }
  }

Auron does not support:

Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 64.0 failed 1 times, most recent failure: Lost task 0.0 in stage 64.0 (TID 72) (10.147.100.59 executor driver): java.lang.RuntimeException: java.lang.RuntimeException: task panics: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[Project]: output() returns error: Execution error: Unsupported data type Binary for function substr,expected Utf8View, Utf8 or LargeUtf8.
	at org.apache.auron.jni.AuronCallNativeWrapper.checkError(AuronCallNativeWrapper.java:166)
	at org.apache.auron.jni.AuronCallNativeWrapper.close(AuronCallNativeWrapper.java:190)
	at org.apache.auron.jni.AuronCallNativeWrapper.checkError(AuronCallNativeWrapper.java:165)
	at org.apache.auron.jni.AuronCallNativeWrapper.loadNextBatch(AuronCallNativeWrapper.java:123)
	at org.apache.spark.sql.auron.NativeHelper$$anon$1.hasNext(NativeHelper.scala:132)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)

To Reproduce

create table temp using parquet as select binary(unhex('01020304')) as col1;
select substring(col1, 2, 2) from temp;

Expected behavior

Auron should support binary input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions