Skip to content

Conversation

@zlatanovic-nebojsa
Copy link
Contributor

Fix BIFF8 compatibility issues in XLS writer

Summary

This PR fixes 6 BIFF8 (Excel 5 Binary File Format) compatibility issues in the XLS writer to restore compatibility with old PHPExcel Excel5 writer behavior.

Each fix is in a separate commit for easy cherry-picking.


Commits (Each Can Be Cherry-Picked Independently)

1. Fix DIMENSIONS Record - 0-based Column Indices ⚠️ CRITICAL

Commit: 0c18ab184

Problem: DIMENSIONS record incorrectly used 1-based column indices instead of 0-based as required by BIFF8 specification.

Impact: Extra empty column appears when converting XLS to CSV.

Example: For columns A-G (7 columns):

  • Before: colMic=1, colMac=8
  • After: colMic=0, colMac=7

Test Result:

✓ colMic = 0 (CORRECT - 0-based)
✓ colMac = 7 (CORRECT - 0-based, 7 columns A-G)
✓✓✓ ALL TESTS PASSED!

2. Fix WINDOW2 Record - Zoom Factors

Commit: df12805bb

Problem: PhpSpreadsheet wrote non-zero zoom factors. Old PHPExcel always wrote 0x0000.

Fix: Set both zoom_factor_page_break and zoom_factor_normal to 0x0000.


3. Remove PageLayoutView Record

Commit: 9ec328ff6

Problem: PhpSpreadsheet writes PageLayoutView record (0x088B). Old PHPExcel did not.

Fix: Removed writePageLayoutView() call to match Excel 5 format.

Benefit: Reduces file size by 20 bytes per sheet.


4. Fix SHEETPROTECTION Options Bitmask

Commit: eadade52d

Problem: PhpSpreadsheet uses dynamic calculation (typically 0x4403). Old PHPExcel used inverted logic (0x7FFF).

Fix: Set options = 0x7FFF to match old PHPExcel behavior.


5. Remove ConditionalFormatting Records

Commit: 82f272868

Problem: PhpSpreadsheet writes CFHEADER/CFRULE records. Old PHPExcel did not.

Fix: Removed writeConditionalFormatting() call to match Excel 5 format.

Note: 4 tests fail because they expect CF records. Old PHPExcel didn't write these. Tests may need updating or skipping if this commit is accepted.


6. Fix COLINFO Records - Generate 256 Entries

Commit: 6f7f5e406

Problem: PhpSpreadsheet generates COLINFO only for used columns. Old PHPExcel generated 256 (0-255) always.

Fix: Changed COLINFO loop to always generate 256 records.

Impact: Structural compatibility with old PHPExcel files.


Test Results

✅ DIMENSIONS Fix Verified

Custom test confirms DIMENSIONS record now correctly uses 0-based indices.

⚠️ XLS Writer Tests: 47/51 Passed

Tests: 51, Assertions: 208, Failures: 4

4 Failing Tests (All ConditionalFormatting-related):

  1. ConditionalFontColorTest::testConditionalFontColor
  2. ConditionalLimitsTest::testLimits
  3. ConditionalUnionTest::testConditionalUnion
  4. ConditionalUnionTest::testIntersectionRange

Reason: These tests expect ConditionalFormatting records, but old PHPExcel didn't write them. If commit #5 is accepted, these tests should be updated to not expect CF records in Excel 5 format.


BIFF8 Specification References

  • Microsoft Excel Binary File Format Specification (BIFF8)
  • DIMENSIONS record requires 0-based row and column indices
  • colMic: First column index (0-based)
  • colMac: Last column index + 1 (0-based)

Backwards Compatibility

These changes restore Excel 5 (BIFF8) format compatibility. Applications expecting the current behavior may need adjustments, but this aligns with:

  • Microsoft BIFF8 specification
  • Old PHPExcel Excel5 writer behavior
  • Legacy Excel 5 file readers

Future Work - Achieving 100% Backwards Compatibility

This PR addresses all Worksheet-level BIFF8 compatibility issues. However, for 100% byte-identical compatibility with old PHPExcel Excel5 files, two additional components require fixes:

1. Shared String Table (SST) - Workbook Level

The writeSharedStringsTable() method in Writer/Xls/Workbook.php needs to match old PHPExcel's exact SST encoding and chunking logic. This affects string storage and CONTINUE record handling.

2. OLE Container & Timestamps - File Level

The OLE Root structure needs to match old PHPExcel's timestamp handling and container organization for deterministic file generation.

These changes are more invasive (affecting private methods and file structure) and will be proposed in a separate PR if this one is accepted. Together, all changes enable 100%
byte-identical output
with old PHPExcel, which is critical for:

  • Legacy API compatibility
  • Regression testing against historical files
  • Deterministic file generation

A working reference implementation demonstrating 100% compatibility is available and has been tested with files up to 377 KB / 20,000+ BIFF records.

Fixes #4682

The XLS writer incorrectly used 1-based column indices in the BIFF8
DIMENSIONS record, violating the Microsoft Excel Binary File Format
specification which requires 0-based indices.

This bug caused an extra empty column to appear when converting XLS
files to other formats (e.g., CSV).

Changes:
- Modified column index initialization to subtract 1 from the result
  of Coordinate::columnIndexFromString() to convert from 1-based to
  0-based indexing
- Updated COLINFO loop to use the corrected 0-based lastColumnIndex

Per BIFF8 specification:
- colMic (first column) must be 0-based
- colMac (column after last column) must be 0-based

Example: For columns A-G (7 columns):
- Before: colMic=1, colMac=8 (incorrect)
- After:  colMic=0, colMac=7 (correct)

Fixes PHPOffice#4682
Old PHPExcel always wrote 0x0000 for both zoom_factor_page_break and
zoom_factor_normal in the WINDOW2 record. PhpSpreadsheet changed this
to write actual zoom values, breaking byte-identical compatibility.

This restores the Excel 5 (BIFF8) behavior by setting both zoom
factors to 0x0000, matching old PHPExcel output exactly.

Related to PHPOffice#4682
PhpSpreadsheet writes the PageLayoutView record (0x088B), but old
PHPExcel did not include this record in Excel 5 (BIFF8) format.

This removes the writePageLayoutView() call to match old PHPExcel
behavior and reduce file size by 20 bytes per sheet.

Related to PHPOffice#4682
Old PHPExcel used inverted protection logic and always wrote 0x7FFF
for the options bitmask (all protection features disabled except sheet
protection itself).

PhpSpreadsheet changed this to dynamic calculation resulting in
different values (typically 0x4403), breaking compatibility.

This restores the Excel 5 (BIFF8) behavior by setting options to
0x7FFF to match old PHPExcel output exactly.

Related to PHPOffice#4682
PhpSpreadsheet writes CFHEADER and CFRULE records for conditional
formatting. Old PHPExcel did not write these records in Excel 5
(BIFF8) format.

This removes the writeConditionalFormatting() call to match old
PHPExcel behavior and avoid file size inflation.

Related to PHPOffice#4682
PhpSpreadsheet generates COLINFO records only for used columns, but
old PHPExcel always generated 256 COLINFO records (columns 0-255)
regardless of actual usage.

This causes structural incompatibility where files have different
numbers of COLINFO records, breaking byte-identical compatibility
and potentially causing issues with legacy readers.

This changes the COLINFO generation to always output 256 records
(0-255) to match old PHPExcel behavior exactly.

Related to PHPOffice#4682
@oleibman
Copy link
Collaborator

Thank you for your effort thus far. Here are my initial impressions. Up front, I will tell you that compatibility with PHPExcel is not a priority for us. It is more than 6 years behind our current code base, and is therefore that far behind on major and minor fixes.

  1. 0-based column indices. This probably merits a PR on its own.

  2. You will need to make a better case for this change. Does writing out the zoom factors make for an invalid spreadsheet when it is opened in Excel, or when PhpSpreadsheet attempts to load it?

  3. PageLayoutView. 20 bytes per sheet isn't nothing, but it isn't much. Does writing out the record make for an invalid spreadsheet when it is opened in Excel, or when PhpSpreadsheet attempts to load it?

  4. SheetProtection Options Bitmask. I don't understand what you're trying to accomplish. Same question as (2) and (3).

  5. Remove Conditional Formatting Records. Complete non-starter. PHPExcel did not properly support conditional formatting for Xls, and neither did PhpSpreadsheet for a long time. Getting conditional formatting working for Xls was a major effort (see individual PRs listed in issue Xls Glitches #3403). There is no need to undo it.

  6. COLINFO records. Same questions as above. Do the missing COLINFO records adversely affect Excel or PhpSpreadsheet when they try to load the spreadsheet?

Future 1. Same question as always.

Future 2. OLE Container. I don't know much about OLE, so I'm willing to hear more about this, but subject to the usual questions.

In summary, I would suggest creating a separate PR for (1). We can continue discussing the other items in this PR if you wish.

@zlatanovic-nebojsa
Copy link
Contributor Author

Hi @oleibman,

Thank you for the feedback! I've created a separate PR for the DIMENSIONS fix as you suggested:
(#4687)

This PR focuses solely on the 0-based column indices issue and includes the real-world context about legacy Excel parser compatibility.

Regarding the other commits, I understand your concerns about PHPExcel compatibility not being a priority. Unfortunately, I don't have the bandwidth to provide additional
evidence or test cases for those changes at this time. I'll maintain those fixes in my local fork where they're working well for our specific use case with legacy parsers.

I really appreciate your time reviewing this and the guidance on splitting out the DIMENSIONS fix!

Thanks again!

@oleibman
Copy link
Collaborator

PR #4687, which addresses your main problem, is now merged. I do not think it is likely that we will address any of the other issues, so I am closing this PR now. Feel free to open new issues, bearing in mind that we will address cases where PhpSpreadsheet is generating incorrect results, but we will not address cases just because they don't happen to match PHPExcel.

@oleibman oleibman closed this Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Xls Writer uses 1-based column indices in BIFF DIMENSIONS record instead of required 0-based indices

2 participants