Skip to content

Unicode support with PIC N notation #260

@schaloner-kbc

Description

@schaloner-kbc

Question

We are reading in COBOL files and are encountering an issue with unicode definitions in the copybooks.

      ******************************************************************        
      * COBOL DECLARATION FOR VIEW  XXXXXXXX                           *        
      ******************************************************************        
       01  YYYYYYYY-YYY.                                                        
         10  ZZZ-ZZZZZZ                      PIC N(4). 
         ... more lines

This results in an error:

Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 5: Invalid input 'N' at position 5:49
	at za.co.absa.cobrix.cobol.parser.antlr.ThrowErrorStrategy.recover(ANTLRParser.scala:33)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.pic(copybookParser.java:2469)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.primitive(copybookParser.java:2791)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.item(copybookParser.java:3015)
	at za.co.absa.cobrix.cobol.parser.antlr.copybookParser.main(copybookParser.java:214)
	at za.co.absa.cobrix.cobol.parser.antlr.ANTLRParser$.parse(ANTLRParser.scala:72)
	at za.co.absa.cobrix.cobol.parser.CopybookParser$.parseTree(CopybookParser.scala:124)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.loadCopyBook(FixedLenNestedReader.scala:96)
	at za.co.absa.cobrix.spark.cobol.reader.fixedlen.FixedLenNestedReader.<init>(FixedLenNestedReader.scala:57)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createFixedLengthReader(DefaultSource.scala:88)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.buildEitherReader(DefaultSource.scala:75)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:60)
	at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)

According to this, the copybook is valid.

Are there any plans to support this notation?

Thanks,
Steve

Metadata

Metadata

Assignees

Labels

acceptedAccepted for implementationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions