Skip to content

Conversation

@yruslan
Copy link
Collaborator

@yruslan yruslan commented Jun 5, 2025

This is so leading zeros are retained.

Summary by CodeRabbit

  • New Features
    • Introduced a new option to force COBOL DISPLAY fields to always be parsed as strings, preserving leading/trailing zeros and signs. This can be enabled via the display_pic_always_string parameter.
  • Bug Fixes
    • Added validation to prevent incompatible options (display_pic_always_string and strict_integral_precision) from being used together.
  • Documentation
    • Updated documentation to describe the new configuration option and its behavior.
  • Tests
    • Added and updated tests to verify correct parsing and schema generation for DISPLAY fields as strings, including validation of option exclusivity.
  • Chores
    • Upgraded Spark dependency to version 3.5.2.
    • Added a new test dependency for logging.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 5, 2025

Walkthrough

A new configuration option, display_pic_always_string, was introduced to allow COBOL DISPLAY fields to be parsed as strings rather than numeric types, preserving formatting such as leading zeros. This involved adding a Boolean parameter throughout the parsing, decoding, and schema construction logic, updating documentation, enforcing option exclusivity, and introducing related tests and metadata handling.

Changes

File(s) Change Summary
README.md Documented the new .option("display_pic_always_string", "false") configuration option for data parsing.
pom.xml Updated Spark version from 3.4.4 to 3.5.2.
spark-cobol/pom.xml Added test-scoped dependency on org.slf4j:slf4j-log4j12.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala Added isDisplayAlwaysString Boolean parameter to public parse methods and updated ScalaDoc.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala Added isDisplayAlwaysString parameter to parse method and propagated it to ParserVisitor.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala Updated constructor and decoder selection to include isDisplayAlwaysString.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala Explicitly set isDisplayAlwaysString = false in decoder selection.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala Added isDisplayAlwaysString to getDecoder; introduced logic to decode integral DISPLAY fields as strings when flag is set.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala Added isDisplayAlwaysString parameter to CobolParameters case class and documentation.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala Added isDisplayAlwaysString parameter to ReaderParameters case class and documentation.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala Added isDisplayAlwaysString parameter to CobolSchema and its factory method; updated ScalaDoc.
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/parameters/CobolParametersParser.scala Added parsing and validation for display_pic_always_string; enforced mutual exclusivity with strict_integral_precision.
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala Added isDisplayAlwaysString parameter; mapped integral DISPLAY fields to Spark StringType when enabled; added metadata for max length; updated companion object.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala Updated test to pass isDisplayAlwaysString = false to ParserVisitor.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala Updated test to pass isDisplayAlwaysString = false to ParserVisitor.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala Updated constructor calls to CobolSchema to include new Boolean parameter.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala Updated all CobolSchema constructor calls for new parameter; added test for string display types.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala Updated CobolSchema constructor call for new Boolean parameter.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala Updated superclass CobolSchema constructor call for new Boolean parameter.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala Updated CobolSchema constructor call for new Boolean parameter.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/Test01DisplayPicAsStrings.scala Added new test suite for DISPLAY PIC as strings, covering default, string, and error scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SparkCobol
    participant CobolParametersParser
    participant CopybookParser
    participant DecoderSelector
    participant CobolSchema

    User->>SparkCobol: .option("display_pic_always_string", "true")
    SparkCobol->>CobolParametersParser: parse(params)
    CobolParametersParser->>CopybookParser: parseTree(..., isDisplayAlwaysString)
    CopybookParser->>DecoderSelector: getDecoder(..., isDisplayAlwaysString)
    DecoderSelector-->>CopybookParser: Decoder (string or numeric)
    CopybookParser-->>CobolSchema: Parsed Copybook
    CobolSchema-->>SparkCobol: Schema with StringType for DISPLAY fields (if enabled)
Loading

Poem

In fields of DISPLAY, numbers once reigned,
Now strings can appear, with zeros retained.
A toggle, a flag, and schemas anew,
Spark and Cobol, together they grew.
🐇 With options so clever, our parsing’s a breeze—
Hopping through bytes with the greatest of ease!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1e513e and 0d7e167.

📒 Files selected for processing (1)
  • README.md (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary or {PR Summary} to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

github-actions bot commented Jun 5, 2025

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.3% -0.08% 🍏
Files changed 68.97% 🍏

File Coverage
NonTerminalsAdder.scala 100% 🍏
ParserVisitor.scala 89.48% 🍏
ANTLRParser.scala 85.86% 🍏
CopybookParser.scala 83.96% 🍏
CobolSchema.scala 81.16% -1.03% 🍏
DecoderSelector.scala 68.92% -3.73% 🍏

@github-actions
Copy link

github-actions bot commented Jun 5, 2025

JaCoCo code coverage report - 'spark-cobol'

File Coverage [87.66%] 🍏
CobolSchema.scala 94.54% 🍏
CobolParametersParser.scala 85.71% 🍏
Total Project Coverage 82.15% 🍏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
README.md (1)

1535-1535: Document option exclusivity.

The description of .option("display_pic_always_string", ...) should mention that this flag is mutually exclusive with strict_integral_precision. Please add a note indicating that enabling display_pic_always_string cannot be combined with strict_integral_precision.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)

450-471: Consider simplifying the test to focus on core functionality.

The test correctly validates that integral DISPLAY fields become strings when isDisplayAlwaysString = true. However, the inclusion of segment ID fields (segmentIdFieldsCnt = 2) adds complexity unrelated to the feature being tested.

Consider simplifying the test by removing the segment ID fields:

-    val cobolSchema = new CobolSchema(parsedSchema, SchemaRetentionPolicy.CollapseRoot, true, false, "", false, false, 2)
+    val cobolSchema = new CobolSchema(parsedSchema, SchemaRetentionPolicy.CollapseRoot, true, false, "", false, false)

And update the expected schema accordingly:

-      """root
-        | |-- Seg_Id0: string (nullable = true)
-        | |-- Seg_Id1: string (nullable = true)
-        | |-- STR1: string (nullable = true)
+      """root
+        | |-- STR1: string (nullable = true)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b21b21 and 1074bbc.

📒 Files selected for processing (21)
  • README.md (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (8 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (3 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala (5 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (1 hunks)
  • pom.xml (1 hunks)
  • spark-cobol/pom.xml (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/parameters/CobolParametersParser.scala (4 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (4 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala (1 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (21 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/Test01DisplayPicAsStrings.scala (1 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala (1 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala (1 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (8)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (1)
  • ASCII (25-25)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/StringTrimmingPolicy.scala (1)
  • StringTrimmingPolicy (19-46)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
  • DecoderSelector (30-352)
  • getDecoder (57-75)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
  • SchemaRetentionPolicy (19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
  • SchemaRetentionPolicy (19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
  • SchemaRetentionPolicy (19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
  • SchemaRetentionPolicy (19-40)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
  • DecoderSelector (30-352)
  • getDecoder (57-75)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (1)
  • ASCII (25-25)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/StringTrimmingPolicy.scala (1)
  • StringTrimmingPolicy (19-46)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (37)
pom.xml (1)

113-113: Verify Spark version bump for compatibility
You’ve updated <spark.version> to 3.5.2. Ensure compatibility with Spark 3.5 migration notes and APIs, and confirm all modules compile and tests pass with the new Spark version.

spark-cobol/pom.xml (1)

61-65: Approve test-scoped SLF4J binding
The slf4j-log4j12 dependency is correctly scoped to test and inherits its version from the parent POM’s dependencyManagement section. No conflicting SLF4J bindings are introduced.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (1)

36-38: LGTM! Proper test adjustment for API compatibility.

The explicit addition of isDisplayAlwaysString = false correctly updates the test to match the new ParserVisitor constructor signature while maintaining the existing validation behavior.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (1)

76-76: LGTM! Appropriate hardcoded value for synthetic fields.

The explicit isDisplayAlwaysString = false parameter is correctly set for non-terminal alphanumeric fields, as these synthetic group representations shouldn't use the display-always-string behavior.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (2)

52-52: LGTM! Clear and accurate documentation.

The documentation clearly explains that the parameter controls whether DISPLAY format fields remain as strings without numeric conversion.


91-91: LGTM! Proper parameter addition to configuration class.

The isDisplayAlwaysString parameter is correctly added to the case class with appropriate naming and positioning.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala (2)

62-62: LGTM! Consistent documentation across parameter classes.

The documentation aligns with the corresponding parameter in CobolParameters.scala and clearly describes the functionality.


112-112: LGTM! Proper default value for backward compatibility.

The isDisplayAlwaysString = false default value ensures backward compatibility while enabling the new string preservation feature when explicitly requested.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala (1)

40-40: LGTM! Constructor updated correctly for new parameter.

The addition of the isDisplayAlwaysString parameter (set to false) correctly maintains the existing test behavior while aligning with the updated CobolSchema constructor signature.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala (1)

105-105: LGTM! Helper method updated correctly.

The parseSchema helper method correctly adds the isDisplayAlwaysString parameter (set to false) to maintain existing test behavior while supporting the new constructor signature.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)

44-44: LGTM! Constructor parameter added correctly.

The isDisplayAlwaysString parameter is properly positioned in the constructor signature and aligns with the broader feature implementation.


858-858: LGTM! Parameter correctly propagated to decoder selection.

The isDisplayAlwaysString parameter is properly passed to DecoderSelector.getDecoder, completing the chain from parameter parsing to decoding logic for the DISPLAY format string feature.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (1)

36-38: LGTM! Correct compatibility update for new parameter.

The addition of isDisplayAlwaysString = false properly maintains backward compatibility for this test while adapting to the updated ParserVisitor constructor signature.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala (1)

27-27: LGTM! Proper constructor parameter update.

The addition of false for the isDisplayAlwaysString parameter correctly updates the CobolSchema constructor call to match the new signature.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)

58-58: LGTM! Consistent constructor updates.

All CobolSchema constructor calls have been correctly updated to include isDisplayAlwaysString = false, maintaining backward compatibility for existing tests.

Also applies to: 76-76, 97-97, 116-116, 138-138, 166-166, 182-182, 199-199, 214-214, 232-232, 246-246, 275-275, 293-293, 314-314, 335-335, 352-352, 370-370, 385-385, 404-404, 433-433

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala (4)

39-39: LGTM! Clear and accurate parameter documentation.

The ScalaDoc clearly explains the purpose of the isDisplayAlwaysString parameter.


50-50: LGTM! Proper parameter placement in constructor.

The isDisplayAlwaysString parameter is correctly positioned in the constructor signature with appropriate type and naming.


98-98: LGTM! Correct parameter propagation to parser.

The isDisplayAlwaysString parameter is properly passed to both CopybookParser.parseTree calls, ensuring the feature works for both single and multiple copybook scenarios.

Also applies to: 122-122


139-149: LGTM! Proper constructor parameter ordering.

The updated constructor call correctly passes all parameters in the right order, including the new isDisplayAlwaysString parameter from readerParameters.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala (2)

57-57: LGTM: Parameter addition follows existing patterns.

The new isDisplayAlwaysString parameter is correctly positioned and follows the established naming conventions.


68-68: LGTM: Parameter correctly propagated to ParserVisitor.

The parameter is properly passed to the ParserVisitor constructor, maintaining the correct parameter order.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/parameters/CobolParametersParser.scala (4)

75-75: LGTM: Parameter constant follows naming conventions.

The new parameter constant is appropriately named and positioned with other schema transformation parameters.


274-274: LGTM: Parameter parsing is consistent.

The parameter is parsed using the same pattern as other boolean options with appropriate default value.


414-414: LGTM: Parameter propagated to ReaderParameters.

The parameter is correctly included in the ReaderParameters structure construction.


928-931: LGTM: Validation logic is well-implemented.

The mutual exclusivity check between display_pic_always_string and strict_integral_precision is correctly implemented with a clear error message. This prevents conflicting parsing behaviors.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/Test01DisplayPicAsStrings.scala (4)

30-38: LGTM: Well-defined test copybook and data.

The copybook definition covers various numeric types (signed/unsigned integers, decimals) and the binary test data appears comprehensive for testing the parsing behavior.


40-82: LGTM: Comprehensive test for default behavior.

This test properly validates that numeric DISPLAY fields are parsed as their appropriate numeric types by default. The expected schema and data correctly reflect the default parsing behavior.


84-130: LGTM: Thorough test for string parsing behavior.

This test comprehensively validates the new display_pic_always_string option:

  • Correctly shows that integral fields become strings while decimals remain decimals
  • Validates that string representations preserve formatting (leading zeros, signs)
  • Includes metadata validation for maximum length
  • Expected data properly reflects the string parsing behavior

132-148: LGTM: Proper validation of incompatible options.

This test correctly validates that the mutual exclusivity between display_pic_always_string and strict_integral_precision options is enforced with the expected error message.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (4)

54-71: LGTM: Constructor parameter addition is well-integrated.

The new isDisplayAlwaysString parameter is properly added with appropriate default value and correctly passed to the superclass constructor.


194-197: LGTM: Correct handling of integral types as strings.

The new case correctly identifies integral types when the isDisplayAlwaysString flag is true and appropriately returns StringType while preserving metadata generation.


237-240: LGTM: Accurate metadata calculation for integral strings.

The addIntegralStringMetadata helper method correctly calculates the maximum length by accounting for the sign position (adding 1 if sign is present) and uses the integral precision as the base length.


307-307: LGTM: Parameter properly propagated in companion object.

The fromBaseReader method correctly includes the new isDisplayAlwaysString parameter from the base schema.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)

59-59: LGTM - Parameter addition is well-designed.

The new isDisplayAlwaysString parameter with a default value of false maintains backward compatibility while enabling the new feature.


256-277: LGTM - String decoder implementation is correct.

The method correctly:

  • Determines encoding (EBCDIC vs ASCII)
  • Handles signed vs unsigned fields appropriately
  • Calculates sign overpunch allowance
  • Returns string representations instead of numeric types

This implementation achieves the goal of preserving string format for numeric DISPLAY fields.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (2)

118-118: LGTM - Parameter propagation is consistent.

The isDisplayAlwaysString parameter is correctly added to all method signatures with appropriate default values and consistent naming.

Also applies to: 138-138, 160-160, 186-186, 205-205, 227-227, 274-274


289-289: LGTM - Parameter passed to ANTLRParser correctly.

The isDisplayAlwaysString parameter is properly propagated to the ANTLRParser.parse method call.


val parsedCopybook = CopybookParser.parse(copybook, dataEncoding = ASCII, stringTrimmingPolicy = StringTrimmingPolicy.TrimNone)
val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, "", false)
val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify constructor parameter count.

This CobolSchema constructor call has 6 parameters, while other test files in this PR show 7 parameters. Please verify if this constructor call is missing the final boolean parameter to match the updated signature.


🏁 Script executed:

#!/bin/bash
# Description: Check CobolSchema constructor signatures to verify parameter count
# Expected: All constructor calls should have consistent parameter counts

# Search for CobolSchema constructor calls
rg -A 2 "new CobolSchema\(" --type scala

Length of output: 10950


Add missing boolean parameter to CobolSchema constructor

The CobolSchema signature requires seven arguments, but this call only provides six, causing it to diverge from other tests.

• File: spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala
Line: 55

Suggested fix:

- val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false)
+ val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false, false)

Ensure the added boolean matches the intended behavior (e.g., strict parsing).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false)
val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false, false)
🤖 Prompt for AI Agents
In
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala
at line 55, the CobolSchema constructor call is missing the seventh boolean
parameter required by the updated constructor signature. Add the missing boolean
argument at the end of the parameter list, ensuring it aligns with the intended
behavior such as strict parsing or other relevant flag used in other test files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants