Skip to content

Conversation

@yruslan
Copy link
Collaborator

@yruslan yruslan commented Jan 6, 2026

Summary by CodeRabbit

  • New Features

    • Introduced a relaxed sign-overpunch mode for numeric COBOL parsing that permits more permissive interpretation of overpunched signs while preserving default strict behavior; applied consistently across integer/decimal types and both EBCDIC and ASCII inputs.
  • Tests

    • Added and expanded tests covering relaxed sign-overpunching for EBCDIC and ASCII data and multiple numeric formats to validate parsing and expected JSON output.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

Walkthrough

Adds a boolean flag relaxedOvepunch to numeric decoders, propagates it from DecoderSelector into StringDecoders, updates decoder call sites and unit tests, and adds integration tests exercising relaxed sign-overpunch behavior for ASCII and EBCDIC inputs.

Changes

Cohort / File(s) Summary
Decoder implementation
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala
Added relaxedOvepunch: Boolean parameter to numeric decoding methods (Number/Int/Long/BigNumber/BigDecimal for ASCII & EBCDIC); propagated the flag through callers; adjusted sign/overpunch logic and docs.
Decoder selection / invocations
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala
Updated decoder mappings to supply !strictSignOverpunch via lambdas to StringDecoders; replaced direct method refs with wrappers; minor import reordering.
Unit tests
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala, cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala
Updated test call sites to include relaxedOvepunch (generally false); added tests exercising relaxed sign-overpunching decoding.
Integration tests
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
Added "Test relaxed sign overpunching" cases for EBCDIC and ASCII using strict_sign_overpunching = false and asserting expected numeric output.

Sequence Diagram(s)

(omitted — changes are internal flag propagation and do not introduce multi-component sequential flows warranting a diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I nudged a sign, let punches play,

bytes now wink in a softer way.
Numbers hop, no rigid shove,
tests nod kindly — codes approve.
A merry nibble for parsing love.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: adding a relaxedOvepunch boolean parameter to decoder methods for sign overpunching compatibility.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce2d67e and 82080f9.

📒 Files selected for processing (4)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
🧰 Additional context used
🧬 Code graph analysis (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (10)
  • decodeEbcdicNumber (169-230)
  • decodeAsciiNumber (242-298)
  • decodeEbcdicInt (310-316)
  • decodeAsciiInt (328-334)
  • decodeEbcdicLong (346-352)
  • decodeAsciiLong (364-370)
  • decodeEbcdicBigNumber (384-390)
  • decodeAsciiBigNumber (404-410)
  • decodeEbcdicBigDecimal (423-429)
  • decodeAsciiBigDecimal (442-448)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: Spark 3.5.7 on Scala 2.13.17
  • GitHub Check: Spark 3.5.7 on Scala 2.12.20
🔇 Additional comments (15)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (2)

22-22: LGTM: Import statement correctly added.

The StandardCharsets import is properly placed and required for the test code.


633-916: Test updates maintain backward compatibility with correct parameter placement.

All 158 test calls in StringDecodersSpec.scala (lines 633-916) are correctly updated to pass relaxedOvepunch = false, preserving existing strict sign-overpunch behavior. Parameter ordering is consistent across all decoder methods (decodeEbcdicNumber, decodeAsciiNumber, decodeEbcdicInt, etc.).

This file does not include tests for the new relaxedOvepunch = true behavior. However, test coverage for the relaxed sign-overpunch functionality exists in BinaryDecoderSpec.scala, including tests for decodeEbcdicBigNumber and decodeAsciiNumber with relaxed overpunching enabled.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (9)

22-22: LGTM: Import statement correctly added.

The StandardCharsets import is properly placed and required for UTF-16 string decoding in this file.


163-169: LGTM: Method signature and documentation properly updated.

The new relaxedOvepunch parameter is correctly positioned and documented. The parameter order is consistent with other decoder methods.


181-181: LGTM: Relaxed overpunch logic correctly implemented.

The logic correctly allows multiple overpunch characters when relaxedOvepunch = true:

  • Line 181: Continues processing after encountering a sign character
  • Lines 197, 201: Allows overpunch detection regardless of allowSignOverpunch setting

This ensures the last overpunch character determines the final sign, as documented.

Also applies to: 197-201


236-242: LGTM: Method signature and documentation properly updated.

The ASCII decoder signature matches the EBCDIC variant with consistent parameter ordering and clear documentation.


274-286: LGTM: ASCII relaxed overpunch logic correctly implemented.

The conditional logic properly differentiates between relaxed and strict modes:

  • Relaxed mode (lines 274-279): Accepts overpunch characters at any position
  • Strict mode (lines 281-285): Restricts overpunch to first or last position only (when allowSignOverpunch = true)

This implementation aligns with the documented behavior.


303-316: LGTM: Method signature and parameter propagation correct.

The decodeEbcdicInt method correctly propagates all parameters to decodeEbcdicNumber with consistent ordering.


318-370: LGTM: All integral decoder methods correctly updated.

The methods decodeAsciiInt, decodeEbcdicLong, and decodeAsciiLong all follow the consistent pattern:

  • Correct parameter ordering (relaxedOvepunch before improvedNullDetection)
  • Proper parameter propagation to underlying decoders

372-410: LGTM: BigNumber decoders correctly updated.

Both decodeEbcdicBigNumber and decodeAsciiBigNumber maintain consistent parameter ordering while preserving default values for scale and scaleFactor parameters.


412-448: LGTM: BigDecimal decoders correctly updated.

Both decodeEbcdicBigDecimal and decodeAsciiBigDecimal follow the consistent pattern with correct parameter ordering and propagation.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (4)

27-27: LGTM: Import statement correctly updated.

The import statement properly includes both Charset and StandardCharsets with correct Scala syntax.


131-139: LGTM: Decimal decoder calls correctly updated.

The decoder calls properly pass !strictSignOverpunch as the relaxedOvepunch parameter:

  • When strictSignOverpunch = false, relaxed overpunch mode is enabled
  • Parameter ordering matches the updated StringDecoders signatures
  • Lambda wrappers correctly supply the additional parameter

210-228: LGTM: Integral decoder calls correctly updated.

All integral decoder invocations follow the consistent pattern with correct parameter ordering:

  • relaxedOvepunch parameter properly set to !strictSignOverpunch
  • Line 220 (decodeEbcdicLong) now has the correct parameter order (previously flagged issue was fixed)
  • Lambda wrappers appropriately handle the additional parameter

269-276: LGTM: Display decoder calls correctly updated.

The display-as-string decoder branches properly pass the relaxedOvepunch parameter:

  • EBCDIC and ASCII branches both correctly ordered
  • Parameter values consistent with the decimal and integral decoder patterns

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.51% 🍏
Files changed 99.25% 🍏

File Coverage
StringDecoders.scala 94.02% -0.24% 🍏
DecoderSelector.scala 69.68% 🍏

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

JaCoCo code coverage report - 'spark-cobol'

There is no coverage information present for the Files changed

Total Project Coverage 81.96% 🍏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In
@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala:
- Line 220: The call to StringDecoders.decodeEbcdicLong uses the old parameter
order; swap the last two boolean arguments so relaxedOverpunch precedes
improvedNullDetection — replace the current argument sequence "..., isSigned ||
!strictSignOverpunch, improvedNullDetection, !strictSignOverpunch" with "...,
isSigned || !strictSignOverpunch, !strictSignOverpunch, improvedNullDetection"
so the call matches the corrected decodeEbcdicLong signature.

In
@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala:
- Line 346: The method signature for decodeEbcdicLong has the last two boolean
parameters in the wrong order (improvedNullDetection, relaxedOvepunch) causing
callers to pass values incorrectly; change the parameter order to match the
other decoders (relaxedOvepunch, improvedNullDetection) in the decodeEbcdicLong
signature and update any references inside the method body that use those
parameter names so their semantics remain correct (use
decodeEbcdicInt/decodeAsciiLong as the correct ordering reference).
🧹 Nitpick comments (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (3)

22-22: Verify if this import is needed.

The StandardCharsets import was added but doesn't appear to be used in this file based on the changes shown.

🔎 Verification script
#!/bin/bash
# Check if StandardCharsets is actually used in this file
rg -n 'StandardCharsets\.' cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala

165-165: Fix typo in documentation.

"multiple published signs" should be "multiple punched signs".

🔎 Proposed fix
-    * @param relaxedOvepunch       If true, multiple published signs are allowed. The last is going to be effective
+    * @param relaxedOvepunch       If true, multiple punched signs are allowed. The last is going to be effective

165-165: Consider fixing typo in parameter name throughout the codebase.

The parameter is consistently named relaxedOvepunch but should likely be relaxedOverpunch (missing 'r'). While this is consistent throughout the PR, it's still a typo that might cause confusion.

This is a breaking API change, so it should be done carefully. If you decide to fix it, ensure all call sites are updated.

Also applies to: 238-238, 307-307, 324-324, 343-343, 360-360, 378-378, 398-398, 419-419, 438-438

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala (1)

171-171: Fix typo in test name.

"Test relaxed overpunchiung" should be "Test relaxed overpunching" (missing 'c').

🔎 Proposed fix
-    "Test relaxed overpunchiung" when {
+    "Test relaxed overpunching" when {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2435219 and 4a63b30.

📒 Files selected for processing (5)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
🧰 Additional context used
🧬 Code graph analysis (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (9)
  • StringDecoders (25-449)
  • decodeEbcdicBigNumber (384-390)
  • decodeAsciiBigNumber (404-410)
  • decodeEbcdicInt (310-316)
  • decodeAsciiInt (328-334)
  • decodeEbcdicLong (346-352)
  • decodeAsciiLong (364-370)
  • decodeEbcdicNumber (169-230)
  • decodeAsciiNumber (242-298)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/common/Constants.scala (1)
  • Constants (20-81)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (1)
  • StringDecoders (25-449)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (10)
  • decodeEbcdicNumber (169-230)
  • decodeAsciiNumber (242-298)
  • decodeEbcdicInt (310-316)
  • decodeAsciiInt (328-334)
  • decodeEbcdicLong (346-352)
  • decodeAsciiLong (364-370)
  • decodeEbcdicBigNumber (384-390)
  • decodeAsciiBigNumber (404-410)
  • decodeEbcdicBigDecimal (423-429)
  • decodeAsciiBigDecimal (442-448)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/EbcdicEncoder.scala (1)
  • toEbcdic (37-46)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala (1)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (2)
  • withTempBinFile (60-69)
  • withTempTextFile (42-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: Spark 3.5.7 on Scala 2.13.17
  • GitHub Check: Spark 3.5.7 on Scala 2.12.20
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (1)

131-138: LGTM: Correct mapping of strictSignOverpunch to relaxedOvepunch.

The logic correctly maps !strictSignOverpunch to relaxedOvepunch across all decoder calls. When strict sign overpunching is disabled, relaxed overpunching is enabled, which is the intended behavior.

Also applies to: 210-227, 270-274

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala (1)

171-216: LGTM: Comprehensive test coverage for relaxed overpunching.

The tests correctly verify relaxed overpunching behavior for both EBCDIC and ASCII:

  • EBCDIC test: Bytes with multiple overpunched signs (0xD0 = -0, 0xC4 = +4) correctly decode to 1230.4, where the last sign (+) wins.
  • ASCII test: String "123}D " with multiple overpunched signs (} = -0, D = +4) also correctly decodes to 1230.4.

Both tests use the appropriate strict_sign_overpunching=false setting and validate the expected behavior where multiple overpunched signs are allowed and the last one determines the final sign.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (1)

633-916: LGTM: Consistent test updates for new parameter.

All existing test calls have been correctly updated to include relaxedOvepunch = false, maintaining the existing test behavior while adapting to the new API.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala (1)

48-106: LGTM: Excellent test coverage for relaxed overpunching.

The new tests thoroughly validate relaxed overpunching behavior:

EBCDIC tests (lines 48-58):

  • Multiple overpunched signs in the middle of numbers
  • Correct handling where the last sign wins
  • Proper null return for unsigned negative numbers

ASCII tests (lines 101-106):

  • Multiple overpunched characters anywhere in the string
  • Correct sign determination based on the last overpunched character
  • Proper validation of unsigned constraints

All test logic is sound and expectations are correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In
@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala:
- Line 165: The ScalaDoc for the overpunch-related parameter(s) contains a typo:
replace the word "published" with the correct term "punched" wherever it appears
in StringDecoders.scala (e.g., the param description for relaxedOvepunch in the
StringDecoders object/class and the other listed comment occurrences). Update
the parameter descriptions (references like relaxedOvepunch and any Scaladoc
lines around it) to read "punched signs" (or "punched" as appropriate)
consistently for all occurrences (lines noted: 165, 238, 307, 324, 342, 360,
378, 398, 419, 438) so the documentation uses the established term "sign
overpunching."
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a63b30 and ce2d67e.

📒 Files selected for processing (4)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
🚧 Files skipped from review as they are similar to previous changes (2)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/regression/Test19SignOverpunching.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala
🧰 Additional context used
🧬 Code graph analysis (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (10)
  • decodeEbcdicNumber (169-230)
  • decodeAsciiNumber (242-298)
  • decodeEbcdicInt (310-316)
  • decodeAsciiInt (328-334)
  • decodeEbcdicLong (346-352)
  • decodeAsciiLong (364-370)
  • decodeEbcdicBigNumber (384-390)
  • decodeAsciiBigNumber (404-410)
  • decodeEbcdicBigDecimal (423-429)
  • decodeAsciiBigDecimal (442-448)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.5.7 on Scala 2.12.20
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 3.5.7 on Scala 2.13.17
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
🔇 Additional comments (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (3)

346-346: Confirmed: Parameter order issue resolved.

The critical parameter order inconsistency flagged in the previous review has been fixed. The signature now correctly matches all other decoder methods with relaxedOvepunch before improvedNullDetection.


169-230: LGTM: Relaxed overpunch logic for EBCDIC is sound.

The implementation correctly allows multiple overpunched signs when relaxedOvepunch is true by:

  • Bypassing the strict post-sign validation (line 181)
  • Accepting overpunched characters throughout the string (lines 197, 201)

The last encountered sign determines the final sign, as documented.


242-298: LGTM: ASCII relaxed overpunch logic is correct.

The implementation properly distinguishes between relaxed and strict modes:

  • Relaxed mode: Punched sign characters accepted anywhere in the string
  • Strict mode: Punched signs restricted to first or last position only

Both modes return null for unsupported characters, maintaining validation rigor.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (1)

633-916: LGTM: Test updates maintain backward compatibility.

All decoder test calls have been systematically updated to include relaxedOvepunch = false, preserving existing behavior and expectations. The mechanical consistency of these updates reduces the risk of introducing test regressions.

@yruslan yruslan force-pushed the bugfix/813-fix-relaxed-sign-overpunch-compatibility branch from ce2d67e to 82080f9 Compare January 6, 2026 14:08
@yruslan yruslan merged commit f414705 into master Jan 6, 2026
7 checks passed
@yruslan yruslan deleted the bugfix/813-fix-relaxed-sign-overpunch-compatibility branch January 6, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants