Skip to content

Under some circumstances Cobrix selects wrong record reader failing the Spark job #684

Closed
@yruslan

Description

@yruslan

Describe the bug

This is spotted under some very specific options passed to cobrix.

The error is:

Error while encoding: java.lang.RuntimeException: 
  org.apache.spark.sql.catalyst.expressions.GenericRow is not a valid external type for schema of string

Code snippet that caused the issue

This is the code snippet that causes the error:

val df = spark
  .read
  .format("cobol")
  .option("copybook_contents", copybook)
  .option("record_format", "F")
  .option("segment_field", "IND")
  .option("segment_id_level0", "A")
  .option("segment_id_prefix", "ID")
  .option("redefine-segment-id-map:0", "SEGMENT1 => A")
  .option("redefine-segment-id-map:1", "SEGMENT2 => B")
  .option("redefine-segment-id-map:2", "SEGMENT3 => C")
  .option("pedantic", "true")
  .load("/data/file/location")

(the copybook is provided below)

Expected behavior

spark-cobol should choose variable-record length reader with fixed record length record extractor if the user requested segment if generation.

Context

  • Cobrix version: 2.7.1
  • Spark version: 3.3.4
  • Scala version: 2.12

Copybook (if possible)

         01  R.
           05  IND           PIC X(1).
           05  SEGMENT1.
              10    FIELD1   PIC X(1).
           05  SEGMENT2 REDEFINES SEGMENT1.
              10    FIELD2   PIC X(2).
           05  SEGMENT3 REDEFINES SEGMENT1.
              10    FIELD3   PIC X(3).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions