Skip to content

Missing proper numeric(little endian type) values after parsing #740

@jaysara

Description

@jaysara

Background

I am trying to parse a fixed record size file with digits presented by little-endian. I modified COMP-5 in the copybook to use COMP-9. I am able to parse all the text fields fine. However some numeric fields are not coming out correctly. The example-copybook for Copybook.
The sample input file is Input File
The Cobrix parse produced file is Cobrix Output
The expected output file from original system is Expected
This is my java code,

    ```

String custInputCopyBook = readCopyBook(configPath);// getCustInputCopyBook(configPath,isLocal);

    Dataset<Row> df1 =  spark.read()
            .format("za.co.absa.cobrix.spark.cobol.source")
            .option("copybook_contents", custInputCopyBook)
            .option("encoding", "ascii")
            .option("schema_retention_policy", "collapse_root")
            .option("record_start_offset", "2")
            .load(inputFile);

    df1.printSchema();

    df1.show(false);
    System.out.println("Count "+df1.count());
    df1.repartition(1).write().mode("overwrite").option("header","true").csv(outputPath);

## Question

Can you please help me identify why some numeric values are missing (coming as null )

The example of missing values are Column 1 ->Row -11,
Column 3 --> Row 11,
Column 4-> Row 10,
Column 5 -> Row 6,
Column 8 -> Row 7 , Row 9
All missing values are only for numeric fields. All characters field seem too populate correctly.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions