#782 Add display type for the `spark-cobol` writer #783

yruslan · 2025-09-10T06:51:17Z

Closes #782

Summary by CodeRabbit

New Features
- Added support for COBOL DISPLAY numeric encoding (sign-separated and overpunch, optional explicit decimal point).
Bug Fixes
- Excluded REDEFINES fields from record combination to avoid incorrect validation/encoding.
Documentation
- README updated to list DISPLAY as an allowed PIC S9(n) numeric type.
Tests
- Added comprehensive DISPLAY encoding tests, writer output/round-trip checks, null-input cases, and improved byte-array comparisons.
Chores
- Updated Spark and test dependency versions and cleaned build properties.

… writer.

…efines when parsing the schema.

coderabbitai · 2025-09-10T06:51:24Z

Walkthrough

Adds DISPLAY numeric encoding support for the EBCDIC writer via a new DisplayEncoders implementation and EncoderSelector wiring, test coverage (unit + writer integration), a REDEFINES filter in BasicRecordCombiner, test-utils hex formatting tweak, dependency version bumps, and a README docs update.

Changes

Cohort / File(s)	Summary of changes
Display encoders implementation `cobol-parser/src/main/scala/.../encoding/DisplayEncoders.scala`	New `DisplayEncoders` object providing encoding entry points for DISPLAY numerics (sign-separate and sign-overpunched), input validation, scaling/rounding, and EBCDIC byte population helpers.
Encoder selection wiring `cobol-parser/src/main/scala/.../encoding/EncoderSelector.scala`	Adds `getDisplayEncoder(...)` and routes DISPLAY numeric cases (integral/decimal with no compact) to the new display encoders, handling sign position and explicit-decimal-point flags.
Parser / encoder tests `cobol-parser/src/test/scala/.../encoding/DisplayEncodersSuite.scala`, `.../BCDNumberEncodersSuite.scala`, `.../BinaryEncodersSuite.scala`	New `DisplayEncodersSuite`; added null-input tests for BCD and binary encoders; import adjustments. Tests validate various sign formats, scales, overflows, and null handling.
Writer integration + test helpers `spark-cobol/src/test/scala/.../writer/FixedLengthEbcdicWriterSuite.scala`	Adds DISPLAY-focused writer test that writes DataFrame rows with DISPLAY fields and asserts exact EBCDIC output; enables `TextComparisonFixture`; introduces `assertArraysEqual` helper for byte comparisons.
Record combiner logic `spark-cobol/src/main/scala/.../writer/BasicRecordCombiner.scala`	Added an early filter in `combine()` to drop AST children that have `REDEFINES` before schema validation/encoding.
Test-utils formatting `cobol-parser/src/test/scala/.../testutils/ComparisonUtils.scala`	Changed assertion-failure hex formatting to 0x-prefixed uppercase two-digit hex values joined by ", " (presentation-only change).
Build / dependencies `pom.xml`, `project/Dependencies.scala`	Bumped ScalaTest and Spark versions; removed explicit `maven.compiler.source`/`target`; adjusted dependency version variables/formatting.
Docs `README.md`	Documentation update in EBCDIC Writer "Current Limitations": `PIC S9(n)` numeric types now list `DISPLAY` as allowed.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Writer as EBCDIC Writer
  participant Combiner as BasicRecordCombiner
  participant Selector as EncoderSelector
  participant Display as DisplayEncoders

  Writer->>Combiner: combine(copybookFields)
  note right of Combiner #E8F3F8: New early filter drops fields with REDEFINES
  Combiner-->>Writer: filteredFields

  loop per field
    Writer->>Selector: getEncoder(field metadata)
    alt DISPLAY numeric (compact empty)
      Selector->>Display: encodeDisplayNumber*(value, precision, scale, signPosition, explicitDecimal, isSignSeparate)
      Display-->>Selector: Array[Byte]
      Selector-->>Writer: Encoder wrapping byte output
    else other formats
      Selector-->>Writer: other Encoder
    end
    Writer->>Writer: serialize value using Encoder -> bytes
  end

  Writer-->>User: write part file (bytes)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

#415 Implement the basic version of EBCDIC writer #775 — Adds/changes encoders and EncoderSelector wiring; directly overlaps encoder selection and writer tests.
#776 Add COMP-3 decoders for the EBCDIC writer #778 — Introduces numeric encoder work near EncoderSelector/COMP-3 paths; related to numeric encoding surface.
#769 Add EBCDIC processor as a library routine #771 — Adds encoder and display-related changes touching similar parser/writer encoding flow.

Pre-merge checks (4 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	While most additions implement DISPLAY numeric encoding, the PR also includes unrelated updates such as global version bumps in pom.xml and Dependencies.scala, assertion-formatting changes in ComparisonUtils, null-input tests for BCDNumberEncoders and BinaryEncodersSuite, and a BasicRecordCombiner redefines filter, all of which fall outside the linked issue’s scope of adding DISPLAY support in the EBCDIC writer.	Move non-DISPLAY features such as version bumps, test-utility formatting changes, null-input tests for other encoders, and the BasicRecordCombiner redefines filter into separate PRs or issues so that this PR focuses solely on adding DISPLAY format support for the EBCDIC writer.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The current title concisely describes the primary change by indicating that DISPLAY type support is being added to the Spark-COBOL writer, directly matching the linked issue’s objective and avoiding unnecessary detail.
Linked Issues Check	✅ Passed	The changes introduce and wire the new display numeric encoder (DisplayEncoders and getDisplayEncoder) into both parser and Spark-COBOL writer, add end-to-end tests for DISPLAY fields in the writer suite, and update documentation to reflect DISPLAY support, fully satisfying the linked issue’s objective.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Poem

I nibble bytes with gentle paws,
Display digits learn the laws.
Overpunch hugs the leading place,
Separate signs find their space.
REDEFINES hop out — hooray! — and bytes fall into place. 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13445ff and e4b1633.

📒 Files selected for processing (1)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/782-add-display-type-for-writer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-10T06:53:48Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.71% `-0.01%`	🍏
Files changed	98.37%	🍏

File	Coverage
DisplayEncoders.scala	98.16% `-1.84%`	🍏
EncoderSelector.scala	90.26%	🍏

github-actions · 2025-09-10T06:53:49Z

JaCoCo code coverage report - 'spark-cobol'

File	Coverage [58.95%]	❌
BasicRecordCombiner.scala	58.95%	❌

Total Project Coverage	79.79%	🍏

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pom.xml (1)
101-113: Restore missing compiler source/target properties

The Maven Compiler Plugin in pom.xml (lines 271–272) still references ${maven.compiler.source} and ${maven.compiler.target}, which are undefined and will break the build. Re-add them in the <properties> section, for example:
   <properties>
+    <maven.compiler.source>1.8</maven.compiler.source>
+    <maven.compiler.target>1.8</maven.compiler.target>
     <maven.scalatest.version>2.2.0</maven.scalatest.version>
     
   </properties>

♻️ Duplicate comments (1)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (1)

547-550: Verify the validity of another edge case test combination.

Similar to the previous comment, this test combines scale = 1 with scaleFactor = 2 and explicitDecimalPoint = true. The expected result is all zeros, which suggests this combination is invalid.

🧹 Nitpick comments (9)

README.md (1)
1689-1691: Clarify DISPLAY support modes (overpunched vs separate sign).

Since writer now handles DISPLAY, consider adding a brief note that both sign overpunching and separate leading/trailing sign are supported to set user expectations.
-  - `PIC S9(n)` numeric (integral and decimal) with `DISPLAY`, `COMP`/`COMP-4`/`COMP-5` (big-endian), `COMP-3`, and 
+  - `PIC S9(n)` numeric (integral and decimal) with `DISPLAY` (overpunched sign and leading/trailing separate sign), `COMP`/`COMP-4`/`COMP-5` (big-endian), `COMP-3`, and
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (2)
34-37: Good: early drop of REDEFINES from the writer path.

This aligns with current writer limitations and unblocks copybooks containing REDEFINES.

To simplify types and avoid later casts, filter to primitives here:
-    val copybookFields = ast.children.filter {
-      case f if f.redefines.nonEmpty => false
-      case p: Primitive              => !p.isFiller
-      case g: Group                  => !g.isFiller
-      case _                         => true
-    }
+    val copybookFields = ast.children.collect {
+      case p: Primitive if p.redefines.isEmpty && !p.isFiller => p
+    }
…and adjust downstream uses to work with Seq[Primitive].

43-44: Locale-stable lowercasing.

Use Locale.ROOT to avoid surprises on non-English locales.
-    val sparkFields = df.schema.fields.map(_.name.toLowerCase)
+    val sparkFields = df.schema.fields.map(_.name.toLowerCase(java.util.Locale.ROOT))
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)
262-281: Nit: variable name doesn't match content.

copybookContentsWithBinFields contains DISPLAY definitions. Rename for clarity.
-        val copybookContentsWithBinFields =
+        val copybookContentsWithDisplayFields =
…and update the usage below.

407-415: Reuse shared array comparison to avoid duplication.

You already have ComparisonUtils in cobol-parser tests; if feasible, depend on that test util or move it to a shared test module.
-  def assertArraysEqual(actual: Array[Byte], expected: Array[Byte]): Assertion = {
+  def assertArraysEqual(actual: Array[Byte], expected: Array[Byte]): Assertion = {
     if (!actual.sameElements(expected)) {
       val actualHex = actual.map(b => f"0x$b%02X").mkString(", ")
       val expectedHex = expected.map(b => f"0x$b%02X").mkString(", ")
       fail(s"Actual:   $actualHex\nExpected: $expectedHex")
     } else {
       succeed
     }
   }
Alternatively, import and use the existing ComparisonUtils.assertArraysEqual if test scope dependency boundaries allow.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (4)
48-54: Prefer toPlainString to avoid scientific notation surprises.

BigDecimal.toString may use exponent form in some edge cases. toPlainString is safer for fixed-field encoding.
-      val bigDecimalValue1 = bigDecimal.toString
+      val bigDecimalValue1 = bigDecimal.toPlainString
Also applies to: 104-110

55-59: Consider enforcing precision against digit count, not just outputSize.

Currently only field width is checked. If precision < number of digits (ignoring sign/decimal separator), we should reject/zero. Add a digit-count check after computing bigDecimalValue/intValue.

If desired, I can draft a minimal patch and tests to enforce this.

Also applies to: 111-115, 126-130, 70-76

137-175: Minor: extract EBCDIC constants to improve readability.

Avoid magic numbers (0x40, 0xF0, 0x4B, 0x6B, 0x60, zone nibbles) with named vals at the top of the object.

I can provide a small refactor if you want this tidied up.

Also applies to: 197-215

40-76: Reduce duplication: factor number-normalization into a helper.

Both encoders duplicate the normalize/scale/length-check logic. A shared private method returning the normalized string (and a boolean overflow flag) would simplify maintenance.

Happy to sketch this helper if you’d like.

Also applies to: 96-133

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19f8600 and a328804.

📒 Files selected for processing (9)

README.md (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (3 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1 hunks)
pom.xml (2 hunks)
project/Dependencies.scala (2 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (1 hunks)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/position/Position.scala (1)

Right (26-26)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (3)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)

ComparisonUtils (22-33)

assertArraysEqual (23-31)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (3)

DisplayEncoders (23-216)

encodeDisplayNumberSignOverpunched (81-135)

encodeDisplayNumberSignSeparate (24-79)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/position/Position.scala (2)

Left (23-23)

Right (26-26)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (3)

cobol (165-209)

BinaryUtils (25-248)

getBytesCount (98-126)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (3)

DisplayEncoders (23-216)

encodeDisplayNumberSignSeparate (24-79)

encodeDisplayNumberSignOverpunched (81-135)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (1)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (1)

withTempDirectory (71-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (16)

pom.xml (2)

111-113: Align Spark/ScalaTest versions across SBT and Maven.

Spark 3.5.6 and ScalaTest 3.2.19 match project/Dependencies.scala. No action needed beyond the compiler fix.

101-101: Approve ScalaTest Maven Plugin bump

project/Dependencies.scala (3)

26-26: ScalaTest 3.2.19 update is fine.

Matches the Maven property and new tests.

31-31: Spark 3.5.6 fallback for Scala 2.13 is consistent with pom.xml.

Good alignment.

90-90: Formatting-only change acknowledged.

No semantic impact.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3)

21-21: Import of Assertion is correct.

Enables helper to return Assertion cleanly.

283-291: Deterministic write path is OK.

coalesce(1) + orderBy("A") ensures stable byte order for assertions.

305-331: Nice: explicit byte-for-byte assertions for DISPLAY.

Covers signed/unsigned, explicit decimal point, and separate signs.

Consider adding one case for overpunched positive and negative to guard regressions in overpunch tables.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1)

25-27: LGTM! Improved hex formatting for better readability.

The updated formatting with 0x prefix and uppercase hex values makes the assertion failure messages more professional and easier to parse during debugging.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (3)

19-23: LGTM! Well-structured test suite setup.

The test suite properly imports the necessary dependencies and follows Scala testing best practices using AnyWordSpec.

24-109: Comprehensive test coverage for sign-overpunched integral encoding.

The test cases thoroughly cover various scenarios including positive/negative numbers, signed/unsigned variants, edge cases (too big numbers, negative scale), and error conditions. The expected byte arrays are correctly defined for EBCDIC encoding.

260-263: Document or handle the scaleFactor > 0 && scale > 0 edge‐case
The code explicitly returns all zeros when both scaleFactor and scale are positive, but there’s no Scaladoc or comments explaining that combination. Please either add documentation for this behavior in encodeDisplayNumberSignOverpunched or adjust the test to a fully supported scenario.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (4)

22-22: LGTM! Clean import addition.

The Position import is properly added to support the new display encoder functionality.

52-55: LGTM! Well-integrated display encoder support.

The new pattern matching cases for integral and decimal display types are cleanly integrated into the existing encoder selection logic.

148-172: Well-structured display encoder implementation.

The getDisplayEncoder method follows the existing pattern of other encoder methods in the file, properly handles input normalization, and correctly delegates to the appropriate DisplayEncoders method based on the sign separation flag.

158-159: No action required: nulls forwarded to DisplayEncoders produce zero-filled arrays by design
Passing null here is consistent—DisplayEncoders return zero-filled arrays for nulls, per Cobrix’s design choice to encode nulls as 0x00 for round-trip fidelity.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala

…ts the sign, '+' and '-'.

…ned numbers.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3)

275-284: Rename variable to reflect DISPLAY, not BIN

The copybook here is for DISPLAY fields. Rename for clarity.

-        val copybookContentsWithBinFields =
+        val copybookContentsWithDisplayFields =
@@
-          .option("copybook_contents", copybookContentsWithBinFields)
+          .option("copybook_contents", copybookContentsWithDisplayFields)
@@
-          .option("copybook_contents", copybookContentsWithBinFields)
+          .option("copybook_contents", copybookContentsWithDisplayFields)

Also applies to: 291-292, 342-343

303-307: Close stream on failure (use try/finally or Using)

Avoid potential leaks if readFully throws.

-        val data = fs.open(partFile)
-        val bytes = new Array[Byte](files.head.getLen.toInt)
-        data.readFully(bytes)
-        data.close()
+        val bytes = {
+          val in = fs.open(partFile)
+          try {
+            val arr = new Array[Byte](files.head.getLen.toInt)
+            in.readFully(arr)
+            arr
+          } finally {
+            in.close()
+          }
+        }

454-462: Hex dump prints signed bytes incorrectly; include index/length in failure

For bytes ≥ 0x80, %02X on a signed Byte can render extended hex. Mask with & 0xFF and surface the first mismatch and lengths.

-  def assertArraysEqual(actual: Array[Byte], expected: Array[Byte]): Assertion = {
-    if (!actual.sameElements(expected)) {
-      val actualHex = actual.map(b => f"0x$b%02X").mkString(", ")
-      val expectedHex = expected.map(b => f"0x$b%02X").mkString(", ")
-      fail(s"Actual:   $actualHex\nExpected: $expectedHex")
-    } else {
-      succeed
-    }
-  }
+  def assertArraysEqual(actual: Array[Byte], expected: Array[Byte]): Assertion = {
+    if (!actual.sameElements(expected)) {
+      val mismatchIdx =
+        actual.zipAll(expected, 0.toByte, 0.toByte).indexWhere { case (a, b) => a != b }
+      val actualHex   = actual.map(b => f"0x${b & 0xFF}%02X").mkString(", ")
+      val expectedHex = expected.map(b => f"0x${b & 0xFF}%02X").mkString(", ")
+      val lenMsg = s"lengths: actual=${actual.length}, expected=${expected.length}"
+      val idxMsg = if (mismatchIdx >= 0) s"first mismatch at index $mismatchIdx" else "length mismatch"
+      fail(s"$idxMsg; $lenMsg\nActual:   $actualHex\nExpected: $expectedHex")
+    } else {
+      succeed
+    }
+  }

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)

20-20: Remove unused import

The Left import from za.co.absa.cobrix.cobol.parser.position.Left is not used in this test file.
-import za.co.absa.cobrix.cobol.parser.position.Left

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 236b555 and 13445ff.

📒 Files selected for processing (4)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (1 hunks)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-19T09:12:31.865Z

Learnt from: yruslan
PR: AbsaOSS/cobrix#775
File: spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala:0-0
Timestamp: 2025-08-19T09:12:31.865Z
Learning: In Cobrix COBOL writer, null values are encoded as zero bytes (0x00) for all field types, including PIC X fields, rather than using traditional COBOL padding (EBCDIC spaces 0x40 for PIC X). This design decision prioritizes consistency with the reader side, which already treats zero bytes as nulls, ensuring round-trip fidelity between reading and writing operations.

Applied to files:

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala

🧬 Code graph analysis (4)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (2)

BinaryEncoders (21-71)

encodeBinaryNumber (22-70)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1)

assertArraysEqual (23-31)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (3)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/position/Position.scala (1)

Left (23-23)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)

ComparisonUtils (22-33)

assertArraysEqual (23-31)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncoders.scala (2)

BCDNumberEncoders (21-108)

encodeBCDNumber (38-106)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (3)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)

ComparisonUtils (22-33)

assertArraysEqual (23-31)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncoders.scala (3)

DisplayEncoders (23-212)

encodeDisplayNumberSignOverpunched (82-136)

encodeDisplayNumberSignSeparate (24-80)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/position/Position.scala (2)

Left (23-23)

Right (26-26)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala (2)

SparkUtils (37-554)

convertDataFrameToPrettyJSON (432-442)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (1)

withTempDirectory (71-78)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/TextComparisonFixture.scala (1)

compareText (24-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

🔇 Additional comments (16)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (3)

21-27: LGTM: Test mixins and imports are appropriate

Bringing in Assertion and TextComparisonFixture is correct, and the updated suite signature aligns with the new JSON comparison flow.

309-337: DISPLAY expectations look correct (overpunch + separate signs)

Encodings for S9 overpunch (e.g., C/J/{/}) and separate signs (0x4E '+', 0x60 '-') match EBCDIC conventions, including decimal point 0x4B.

339-377: Round-trip JSON check is sound

Ordering by A before write/read ensures determinism for the comparison. Using TextComparisonFixture keeps this robust across newline differences.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2)

26-31: Good test coverage for null input handling

The addition of null input test cases for both integral and decimal numbers ensures robust handling of edge cases, consistent with the broader null-input coverage pattern across encoder test suites.

144-149: LGTM! Consistent null handling for decimal numbers

The null test case for decimal numbers follows the same pattern as integral numbers, maintaining consistency in the test suite.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1)

24-29: Well-structured null input test

Excellent addition of the null input test case. It follows the established pattern of returning zero bytes for null inputs, which aligns with the Cobrix design decision to encode null values as zero bytes for consistency between reader and writer operations.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/DisplayEncodersSuite.scala (10)

1-22: LGTM! Proper test file structure and imports

The test suite is well-structured with appropriate imports for testing the new DisplayEncoders functionality.

26-31: Good null handling test coverage

The null input test correctly validates that null values return zero-filled byte arrays for the sign overpunched encoding method.

33-38: Correct EBCDIC encoding for positive signed numbers

The test correctly validates the encoding of 12345 with overpunched positive sign (0xC5 = positive sign overpunched on '5').

54-59: Proper unsigned number encoding

The unsigned number test correctly uses signPosition = None and produces standard EBCDIC digits without sign overpunching.

61-66: Correct negative sign overpunching

The test properly validates negative number encoding with 0xD5 (negative sign overpunched on '5').

141-144: Proper explicit decimal point encoding

The test correctly validates that 0x4B represents the EBCDIC decimal point character.

341-346: Correct sign-separate positive encoding

The test properly validates the EBCDIC positive sign character (0x4E) in sign-separate format.

362-367: Well-tested unsigned number in sign-separate format

The unsigned number test correctly uses signPosition = None and produces standard EBCDIC digits without any sign character.

369-374: Proper negative sign-separate encoding

The test correctly validates the EBCDIC negative sign character (0x60) in sign-separate format.

23-641: Comprehensive test coverage for DISPLAY encoders

The test suite provides excellent coverage for both encodeDisplayNumberSignOverpunched and encodeDisplayNumberSignSeparate methods, including:

Null input handling

Various numeric values (positive, negative, zero, small, large)

Sign positions (Left, Right, None)

Explicit decimal point handling

Scale factors (positive and negative)

Edge cases (too big, too precise, incorrect precision)

Both integral and decimal numbers

The test expectations align correctly with EBCDIC encoding standards.

…garding the variable name.

yruslan added 4 commits September 9, 2025 08:15

#782 Add encoders for numbers having DISPLAY format.

c0c831b

#782 Add display encoders to the encoder selector for the spark-cobol…

29c95bc

… writer.

Update sbt and Maven dependencies.

128b79b

#782 spark-cobol writer: Add support for sign position and ignore red…

a328804

…efines when parsing the schema.

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

yruslan added 3 commits September 10, 2025 10:04

#782 spark-cobol writer: DISPLAY numbers with sign separate always pu…

41566c4

…ts the sign, '+' and '-'.

#782 spark-cobol writer: DISPLAY numbers: fixed a unit test for unsig…

236b555

…ned numbers.

#782 spark-cobol writer: DISPLAY numbers: handle nulls properly.

13445ff

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

#782 spark-cobol writer: DISPLAY numbers: fix a nitpick PR comment re…

e4b1633

…garding the variable name.

yruslan merged commit 0731745 into master Sep 10, 2025
6 of 7 checks passed

yruslan deleted the feature/782-add-display-type-for-writer branch September 10, 2025 10:58

coderabbitai bot mentioned this pull request Oct 24, 2025

Extended EBCDIC Encoder - added support for cp1144 #790

Merged

3 tasks

coderabbitai bot mentioned this pull request Nov 6, 2025

#792 Add EBCDIC to ASCII encoders for all single byte code pages #794

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#782 Add display type for the `spark-cobol` writer #783

#782 Add display type for the `spark-cobol` writer #783

Uh oh!

yruslan commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#782 Add display type for the spark-cobol writer #783

#782 Add display type for the spark-cobol writer #783

Uh oh!

Conversation

yruslan commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks (4 passed, 1 warning)

Poem

Uh oh!

github-actions bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#782 Add display type for the `spark-cobol` writer #783

#782 Add display type for the `spark-cobol` writer #783

yruslan commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

github-actions bot commented Sep 10, 2025 •

edited

Loading

github-actions bot commented Sep 10, 2025 •

edited

Loading