Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xml Reader Rich Text #4007

Merged
merged 3 commits into from
May 5, 2024
Merged

Xml Reader Rich Text #4007

merged 3 commits into from
May 5, 2024

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented May 1, 2024

Fix #4001. Thanks to @SlowFox71 who reported the problem and developed most of the solution. This PR adds Rich Text support to the XML reader. The Xml Spreadsheet stores Rich Text as Html tags, children of the ss:Data tag using a specific namespace. These can be parsed into a RichText object using existing method Helper/Html::toRichTextObject. There are 2 items which need special attention.

First, for attributes like bold or italic, Excel uses the appropriate Html tag (e.g. <B>). However, for an attribute like color, Excel uses <Font html:Color="#FF0000">, with a prefix on the Color tag. PhpSpreadsheet's Html parser cannot cope with the prefix. The parser is changed to strip html: from attribute names for the Font tag.

The example cited by the user used a <BR /> to indicate a line break in the data. However, it appears that, at least some of the time, Excel will instead use &#10; to indicate a line break. The existing parser reduces one or more whitespace characters in the text to a single space, and so &#10; will wind up disappearing. I am not sure why the existing code does this, but I do know that I am not willing to break it. Instead, I've added an optional boolean parameter $preserveWhiteSpace to toRichTextObject. If false (default), the existing logic will be used; but if true, substitution for whitespace characters in the text will not happen.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • x ] Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

Fix PHPOffice#4001. Thanks to @SlowFox71 who reported the problem and developed most of the solution. This PR adds Rich Text support to the XML reader. The Xml Spreadsheet stores Rich Text as Html tags, children of the ss:Data tag using a specific namespace. These can be parsed into a RichText object using existing method Helper/Html::toRichTextObject. There are 2 items which need special attention.

First, for attributes like bold or italic, Excel uses the appropriate Html tag (e.g. `<B>`). However, for an attribute like color, Excel uses `<Font html:Color="#FF0000">`, with a prefix on the Color tag. PhpSpreadsheet's Html parser cannot cope with the prefix. The parser is changed to strip `html:` from attribute names for the Font tag.

The example cited by the user used a `<BR />` to indicate a line break in the data. However, it appears that, at least some of the time, Excel will instead use `&#10;` to indicate a line break. The existing parser reduces one or more whitespace characters in the text to a single space, and so `&#10;` will wind up disappearing. I am not sure why the existing code does this, but I do know that I am not willing to break it. Instead, I've added an optional boolean parameter `$preserveWhiteSpace` to `toRichTextObject`. If false (default), the existing logic will be used; but if true, substitution for whitespace characters in the text will not happen.
@oleibman oleibman added this pull request to the merge queue May 5, 2024
Merged via the queue into PHPOffice:master with commit 4a7fa14 May 5, 2024
14 checks passed
@oleibman oleibman deleted the issue4001 branch May 5, 2024 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

XML-Reader: support rich text
1 participant