-
Notifications
You must be signed in to change notification settings - Fork 86
#25 Add an option to retain string format for numbers in DISPLAY format #765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new configuration option, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SparkCobol
participant CobolParametersParser
participant CopybookParser
participant DecoderSelector
participant CobolSchema
User->>SparkCobol: .option("display_pic_always_string", "true")
SparkCobol->>CobolParametersParser: parse(params)
CobolParametersParser->>CopybookParser: parseTree(..., isDisplayAlwaysString)
CopybookParser->>DecoderSelector: getDecoder(..., isDisplayAlwaysString)
DecoderSelector-->>CopybookParser: Decoder (string or numeric)
CopybookParser-->>CobolSchema: Parsed Copybook
CobolSchema-->>SparkCobol: Schema with StringType for DISPLAY fields (if enabled)
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
README.md (1)
1535-1535: Document option exclusivity.The description of
.option("display_pic_always_string", ...)should mention that this flag is mutually exclusive withstrict_integral_precision. Please add a note indicating that enablingdisplay_pic_always_stringcannot be combined withstrict_integral_precision.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)
450-471: Consider simplifying the test to focus on core functionality.The test correctly validates that integral DISPLAY fields become strings when
isDisplayAlwaysString = true. However, the inclusion of segment ID fields (segmentIdFieldsCnt = 2) adds complexity unrelated to the feature being tested.Consider simplifying the test by removing the segment ID fields:
- val cobolSchema = new CobolSchema(parsedSchema, SchemaRetentionPolicy.CollapseRoot, true, false, "", false, false, 2) + val cobolSchema = new CobolSchema(parsedSchema, SchemaRetentionPolicy.CollapseRoot, true, false, "", false, false)And update the expected schema accordingly:
- """root - | |-- Seg_Id0: string (nullable = true) - | |-- Seg_Id1: string (nullable = true) - | |-- STR1: string (nullable = true) + """root + | |-- STR1: string (nullable = true)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
README.md(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala(8 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala(3 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala(5 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala(1 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala(1 hunks)pom.xml(1 hunks)spark-cobol/pom.xml(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/parameters/CobolParametersParser.scala(4 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala(4 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala(1 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala(21 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/Test01DisplayPicAsStrings.scala(1 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala(1 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala(1 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (8)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (1)
ASCII(25-25)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/StringTrimmingPolicy.scala (1)
StringTrimmingPolicy(19-46)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
DecoderSelector(30-352)getDecoder(57-75)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
SchemaRetentionPolicy(19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
SchemaRetentionPolicy(19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
SchemaRetentionPolicy(19-40)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/policies/SchemaRetentionPolicy.scala (1)
SchemaRetentionPolicy(19-40)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
DecoderSelector(30-352)getDecoder(57-75)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (1)
ASCII(25-25)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/StringTrimmingPolicy.scala (1)
StringTrimmingPolicy(19-46)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Spark 3.5.5 on Scala 2.13.16
- GitHub Check: Spark 3.5.5 on Scala 2.12.20
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (37)
pom.xml (1)
113-113: Verify Spark version bump for compatibility
You’ve updated<spark.version>to3.5.2. Ensure compatibility with Spark 3.5 migration notes and APIs, and confirm all modules compile and tests pass with the new Spark version.spark-cobol/pom.xml (1)
61-65: Approve test-scoped SLF4J binding
Theslf4j-log4j12dependency is correctly scoped totestand inherits its version from the parent POM’s dependencyManagement section. No conflicting SLF4J bindings are introduced.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (1)
36-38: LGTM! Proper test adjustment for API compatibility.The explicit addition of
isDisplayAlwaysString = falsecorrectly updates the test to match the newParserVisitorconstructor signature while maintaining the existing validation behavior.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (1)
76-76: LGTM! Appropriate hardcoded value for synthetic fields.The explicit
isDisplayAlwaysString = falseparameter is correctly set for non-terminal alphanumeric fields, as these synthetic group representations shouldn't use the display-always-string behavior.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (2)
52-52: LGTM! Clear and accurate documentation.The documentation clearly explains that the parameter controls whether DISPLAY format fields remain as strings without numeric conversion.
91-91: LGTM! Proper parameter addition to configuration class.The
isDisplayAlwaysStringparameter is correctly added to the case class with appropriate naming and positioning.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala (2)
62-62: LGTM! Consistent documentation across parameter classes.The documentation aligns with the corresponding parameter in
CobolParameters.scalaand clearly describes the functionality.
112-112: LGTM! Proper default value for backward compatibility.The
isDisplayAlwaysString = falsedefault value ensures backward compatibility while enabling the new string preservation feature when explicitly requested.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/SparkSchemaSpec.scala (1)
40-40: LGTM! Constructor updated correctly for new parameter.The addition of the
isDisplayAlwaysStringparameter (set tofalse) correctly maintains the existing test behavior while aligning with the updatedCobolSchemaconstructor signature.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaHierarchicalSpec.scala (1)
105-105: LGTM! Helper method updated correctly.The
parseSchemahelper method correctly adds theisDisplayAlwaysStringparameter (set tofalse) to maintain existing test behavior while supporting the new constructor signature.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
44-44: LGTM! Constructor parameter added correctly.The
isDisplayAlwaysStringparameter is properly positioned in the constructor signature and aligns with the broader feature implementation.
858-858: LGTM! Parameter correctly propagated to decoder selection.The
isDisplayAlwaysStringparameter is properly passed toDecoderSelector.getDecoder, completing the chain from parameter parsing to decoding logic for the DISPLAY format string feature.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (1)
36-38: LGTM! Correct compatibility update for new parameter.The addition of
isDisplayAlwaysString = falseproperly maintains backward compatibility for this test while adapting to the updatedParserVisitorconstructor signature.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scala (1)
27-27: LGTM! Proper constructor parameter update.The addition of
falsefor theisDisplayAlwaysStringparameter correctly updates theCobolSchemaconstructor call to match the new signature.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)
58-58: LGTM! Consistent constructor updates.All
CobolSchemaconstructor calls have been correctly updated to includeisDisplayAlwaysString = false, maintaining backward compatibility for existing tests.Also applies to: 76-76, 97-97, 116-116, 138-138, 166-166, 182-182, 199-199, 214-214, 232-232, 246-246, 275-275, 293-293, 314-314, 335-335, 352-352, 370-370, 385-385, 404-404, 433-433
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala (4)
39-39: LGTM! Clear and accurate parameter documentation.The ScalaDoc clearly explains the purpose of the
isDisplayAlwaysStringparameter.
50-50: LGTM! Proper parameter placement in constructor.The
isDisplayAlwaysStringparameter is correctly positioned in the constructor signature with appropriate type and naming.
98-98: LGTM! Correct parameter propagation to parser.The
isDisplayAlwaysStringparameter is properly passed to bothCopybookParser.parseTreecalls, ensuring the feature works for both single and multiple copybook scenarios.Also applies to: 122-122
139-149: LGTM! Proper constructor parameter ordering.The updated constructor call correctly passes all parameters in the right order, including the new
isDisplayAlwaysStringparameter fromreaderParameters.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala (2)
57-57: LGTM: Parameter addition follows existing patterns.The new
isDisplayAlwaysStringparameter is correctly positioned and follows the established naming conventions.
68-68: LGTM: Parameter correctly propagated to ParserVisitor.The parameter is properly passed to the
ParserVisitorconstructor, maintaining the correct parameter order.spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/parameters/CobolParametersParser.scala (4)
75-75: LGTM: Parameter constant follows naming conventions.The new parameter constant is appropriately named and positioned with other schema transformation parameters.
274-274: LGTM: Parameter parsing is consistent.The parameter is parsed using the same pattern as other boolean options with appropriate default value.
414-414: LGTM: Parameter propagated to ReaderParameters.The parameter is correctly included in the ReaderParameters structure construction.
928-931: LGTM: Validation logic is well-implemented.The mutual exclusivity check between
display_pic_always_stringandstrict_integral_precisionis correctly implemented with a clear error message. This prevents conflicting parsing behaviors.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/Test01DisplayPicAsStrings.scala (4)
30-38: LGTM: Well-defined test copybook and data.The copybook definition covers various numeric types (signed/unsigned integers, decimals) and the binary test data appears comprehensive for testing the parsing behavior.
40-82: LGTM: Comprehensive test for default behavior.This test properly validates that numeric DISPLAY fields are parsed as their appropriate numeric types by default. The expected schema and data correctly reflect the default parsing behavior.
84-130: LGTM: Thorough test for string parsing behavior.This test comprehensively validates the new
display_pic_always_stringoption:
- Correctly shows that integral fields become strings while decimals remain decimals
- Validates that string representations preserve formatting (leading zeros, signs)
- Includes metadata validation for maximum length
- Expected data properly reflects the string parsing behavior
132-148: LGTM: Proper validation of incompatible options.This test correctly validates that the mutual exclusivity between
display_pic_always_stringandstrict_integral_precisionoptions is enforced with the expected error message.spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (4)
54-71: LGTM: Constructor parameter addition is well-integrated.The new
isDisplayAlwaysStringparameter is properly added with appropriate default value and correctly passed to the superclass constructor.
194-197: LGTM: Correct handling of integral types as strings.The new case correctly identifies integral types when the
isDisplayAlwaysStringflag is true and appropriately returnsStringTypewhile preserving metadata generation.
237-240: LGTM: Accurate metadata calculation for integral strings.The
addIntegralStringMetadatahelper method correctly calculates the maximum length by accounting for the sign position (adding 1 if sign is present) and uses the integral precision as the base length.
307-307: LGTM: Parameter properly propagated in companion object.The
fromBaseReadermethod correctly includes the newisDisplayAlwaysStringparameter from the base schema.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
59-59: LGTM - Parameter addition is well-designed.The new
isDisplayAlwaysStringparameter with a default value offalsemaintains backward compatibility while enabling the new feature.
256-277: LGTM - String decoder implementation is correct.The method correctly:
- Determines encoding (EBCDIC vs ASCII)
- Handles signed vs unsigned fields appropriately
- Calculates sign overpunch allowance
- Returns string representations instead of numeric types
This implementation achieves the goal of preserving string format for numeric DISPLAY fields.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (2)
118-118: LGTM - Parameter propagation is consistent.The
isDisplayAlwaysStringparameter is correctly added to all method signatures with appropriate default values and consistent naming.Also applies to: 138-138, 160-160, 186-186, 205-205, 227-227, 274-274
289-289: LGTM - Parameter passed to ANTLRParser correctly.The
isDisplayAlwaysStringparameter is properly propagated to theANTLRParser.parsemethod call.
|
|
||
| val parsedCopybook = CopybookParser.parse(copybook, dataEncoding = ASCII, stringTrimmingPolicy = StringTrimmingPolicy.TrimNone) | ||
| val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, "", false) | ||
| val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify constructor parameter count.
This CobolSchema constructor call has 6 parameters, while other test files in this PR show 7 parameters. Please verify if this constructor call is missing the final boolean parameter to match the updated signature.
🏁 Script executed:
#!/bin/bash
# Description: Check CobolSchema constructor signatures to verify parameter count
# Expected: All constructor calls should have consistent parameter counts
# Search for CobolSchema constructor calls
rg -A 2 "new CobolSchema\(" --type scalaLength of output: 10950
Add missing boolean parameter to CobolSchema constructor
The CobolSchema signature requires seven arguments, but this call only provides six, causing it to diverge from other tests.
• File: spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala
Line: 55
Suggested fix:
- val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false)
+ val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false, false)Ensure the added boolean matches the intended behavior (e.g., strict parsing).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false) | |
| val cobolSchema = new CobolSchema(parsedCopybook, SchemaRetentionPolicy.CollapseRoot, false, false, "", false, false) |
🤖 Prompt for AI Agents
In
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/text/Test02TextFilesOldSchool.scala
at line 55, the CobolSchema constructor call is missing the seventh boolean
parameter required by the updated constructor signature. Add the missing boolean
argument at the end of the parameter list, ensuring it aligns with the intended
behavior such as strict parsing or other relevant flag used in other test files.
This is so leading zeros are retained.
Summary by CodeRabbit
display_pic_always_stringparameter.display_pic_always_stringandstrict_integral_precision) from being used together.