Skip to content

Commit

Permalink
Add HTML content transformer middleware (#5338)
Browse files Browse the repository at this point in the history
* Add HTML content transformer

* Add entry

* Fix alt text

* Apply to fenced code blocks only

* Add breaking changes

* Update entry

* Add PR

* Fix tests

* Fix test

* Add xmlns and remove HTML content provider related attributes

---------

Co-authored-by: Eugene <EOlonov@gmail.com>
  • Loading branch information
compulim and OEvgeny authored Oct 29, 2024
1 parent e5145f3 commit 06bf029
Show file tree
Hide file tree
Showing 25 changed files with 378 additions and 207 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Notes: web developers are advised to use [`~` (tilde range)](https://github.com/
- `styleOptions.bubbleMaxWidth`/`bubbleMinWidth` is being deprecated in favor of `styleOptions.bubbleAttachmentMaxWidth`/`bubbleAttachmentMinWidth` and `styleOptions.bubbleMessageMaxWidth`/`bubbleMessageMinWidth`. The option will be removed on or after 2026-10-08
- Moved to `micromark` for rendering Markdown, instead of `markdown-it`
- Please refer to PR [#5330](https://github.com/microsoft/BotFramework-WebChat/pull/5330) for details
- HTML sanitizer is moved from `renderMarkdown` to HTML content transformer middleware, please refer to PR [#5338](https://github.com/microsoft/BotFramework-WebChat/pull/5338)
- If you customized `renderMarkdown` with a custom HTML sanitizer, please move the HTML sanitizer to the new HTML content transformer middleware

### Added

Expand Down Expand Up @@ -64,6 +66,10 @@ Notes: web developers are advised to use [`~` (tilde range)](https://github.com/
- Added code viewer dialog with syntax highlighting, in PR [#5335](https://github.com/microsoft/BotFramework-WebChat/pull/5335), by [@OEvgeny](https://github.com/OEvgeny)
- Added copy button to code blocks, in PR [#5334](https://github.com/microsoft/BotFramework-WebChat/pull/5334), by [@compulim](https://github.com/compulim)
- Added copy button to view code dialog, in PR [#5336](https://github.com/microsoft/BotFramework-WebChat/pull/5336), by [@compulim](https://github.com/compulim)
- Added HTML content transformer middleware, in PR [#5338](https://github.com/microsoft/BotFramework-WebChat/pull/5338), by [@compulim](https://github.com/compulim)
- HTML content transformer is used by `useRenderMarkdown` to transform the result from `renderMarkdown`
- HTML sanitizer is moved from `renderMarkdown` into HTML content transformer for better coverage
- Copy button is added to fenced code blocks (`<pre><code>`)

### Changed

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
wrap: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
wrap: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
type: 'message'
Expand Down
5 changes: 2 additions & 3 deletions __tests__/html2/markdown/codeBlockCopyButton/behavior.html
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
type: 'message'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
type: 'message'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
type: 'message'
Expand Down
5 changes: 2 additions & 3 deletions __tests__/html2/markdown/codeBlockCopyButton/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,10 @@
Ea sint elit anim enim voluptate aliquip aliqua nulla veniam.
<pre>
Ea et pariatur sint Lorem ex veniam adipisicing.
<pre><code>Ea et pariatur sint Lorem ex veniam adipisicing.
Aliqua magna aliquip nisi quis.
</pre>
</code></pre>
Cupidatat nulla duis dolor nulla ut pariatur minim incididunt quis adipisicing velit id Lorem.`,
type: 'message'
Expand Down
4 changes: 4 additions & 0 deletions packages/bundle/src/AddFullBundle.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import {
type AttachmentMiddleware,
type StyleOptions
} from 'botframework-webchat-api';
import { type HTMLContentTransformMiddleware } from 'botframework-webchat-component';
import { singleToArray, warnOnce, type OneOrMany } from 'botframework-webchat-core';
import React, { type ReactNode } from 'react';

Expand All @@ -18,6 +19,7 @@ type AddFullBundleProps = Readonly<{
attachmentForScreenReaderMiddleware?: OneOrMany<AttachmentForScreenReaderMiddleware>;
attachmentMiddleware?: OneOrMany<AttachmentMiddleware>;
children: ({ extraStyleSet }: { extraStyleSet: any }) => ReactNode;
htmlContentTransformMiddleware?: HTMLContentTransformMiddleware[];
renderMarkdown?: (
markdown: string,
newLineOptions: { markdownRespectCRLF: boolean },
Expand All @@ -41,6 +43,7 @@ const AddFullBundle = ({
attachmentForScreenReaderMiddleware,
attachmentMiddleware,
children,
htmlContentTransformMiddleware,
renderMarkdown,
styleOptions,
styleSet
Expand All @@ -50,6 +53,7 @@ const AddFullBundle = ({
const patchedProps = useComposerProps({
attachmentForScreenReaderMiddleware: singleToArray(attachmentForScreenReaderMiddleware),
attachmentMiddleware: singleToArray(attachmentMiddleware),
htmlContentTransformMiddleware,
renderMarkdown,
styleOptions,
styleSet
Expand Down
26 changes: 12 additions & 14 deletions packages/bundle/src/__tests__/renderMarkdown.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,23 @@ describe('renderMarkdown', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown('Same line.\nSame line. \n2nd line.', styleOptions, renderMarkdownOptions)).toBe(
'<p>Same line.\nSame line.<br />\n2nd line.</p>'
'<p xmlns="http://www.w3.org/1999/xhtml">Same line.\nSame line.<br />\n2nd line.</p>'
);
});

it('should respect CRLF', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown('Same Line.\n\rSame Line.\r\n2nd line.', styleOptions, renderMarkdownOptions)).toBe(
'<p>Same Line.\nSame Line.</p>\n<p>2nd line.</p>'
'<p xmlns="http://www.w3.org/1999/xhtml">Same Line.\nSame Line.</p>\n<p xmlns="http://www.w3.org/1999/xhtml">2nd line.</p>'
);
});

it('should respect LFCR', () => {
const styleOptions = { markdownRespectCRLF: false };

expect(renderMarkdown('Same Line.\r\nSame Line.\n\r2nd line.', styleOptions, renderMarkdownOptions)).toBe(
'<p>Same Line.\nSame Line.</p>\n<p>2nd line.</p>'
'<p xmlns="http://www.w3.org/1999/xhtml">Same Line.\nSame Line.</p>\n<p xmlns="http://www.w3.org/1999/xhtml">2nd line.</p>'
);
});

Expand All @@ -48,7 +48,9 @@ describe('renderMarkdown', () => {

expect(
renderMarkdown('**Message with Markdown**\r\nShould see bold text.', styleOptions, renderMarkdownOptions)
).toBe('<p><strong>Message with Markdown</strong></p>\n<p>Should see bold text.</p>');
).toBe(
'<p xmlns="http://www.w3.org/1999/xhtml"><strong>Message with Markdown</strong></p>\n<p xmlns="http://www.w3.org/1999/xhtml">Should see bold text.</p>'
);
});

it('should render code correctly', () => {
Expand All @@ -60,11 +62,7 @@ describe('renderMarkdown', () => {
styleOptions,
renderMarkdownOptions
)
)
.toBe(`<pre class="webchat__render-markdown__code-block"><webchat--code-block-copy-button class="webchat__code-block-copy-button" data-alt-copied="Copied" data-alt-copy="Copy" data-value="{
&quot;hello&quot;: &quot;World!&quot;
}
"></webchat--code-block-copy-button><code>{
).toBe(`<pre xmlns="http://www.w3.org/1999/xhtml"><code>{
"hello": "World!"
}
</code></pre>`);
Expand All @@ -74,7 +72,7 @@ describe('renderMarkdown', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown('[example](https://sample.com)', styleOptions, renderMarkdownOptions)).toBe(
`<p>\u200B<a href="https://sample.com" aria-label="example " rel="noopener noreferrer" target="_blank">example<img src="" alt="" class="webchat__render-markdown__external-link-icon" /></a>\u200B</p>`
`<p xmlns="http://www.w3.org/1999/xhtml">\u200B<a href="https://sample.com" aria-label="example " rel="noopener noreferrer" target="_blank">example<img src="" alt="" class="webchat__render-markdown__external-link-icon" /></a>\u200B</p>`
);
});

Expand All @@ -83,31 +81,31 @@ describe('renderMarkdown', () => {
const options = { externalLinkAlt: 'Opens in a new window, external.' };

expect(renderMarkdown('[example](https://sample.com)', styleOptions, options)).toBe(
`<p>\u200B<a href="https://sample.com" aria-label="example Opens in a new window, external." rel="noopener noreferrer" target="_blank">example<img src="" alt="" class="webchat__render-markdown__external-link-icon" title="Opens in a new window, external." /></a>\u200B</p>`
`<p xmlns="http://www.w3.org/1999/xhtml">\u200B<a href="https://sample.com" aria-label="example Opens in a new window, external." rel="noopener noreferrer" target="_blank">example<img src="" alt="" class="webchat__render-markdown__external-link-icon" title="Opens in a new window, external." /></a>\u200B</p>`
);
});

it('should render sip protocol links correctly', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown(`[example@test.com](sip:example@test.com)`, styleOptions, renderMarkdownOptions)).toBe(
'<p>\u200B<a href="sip:example@test.com" rel="noopener noreferrer" target="_blank">example@test.com</a>\u200B</p>'
'<p xmlns="http://www.w3.org/1999/xhtml">\u200B<a href="sip:example@test.com" rel="noopener noreferrer" target="_blank">example@test.com</a>\u200B</p>'
);
});

it('should render tel protocol links correctly', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown(`[(505)503-4455](tel:505-503-4455)`, styleOptions, renderMarkdownOptions)).toBe(
'<p>\u200B<a href="tel:505-503-4455" rel="noopener noreferrer" target="_blank">(505)503-4455</a>\u200B</p>'
'<p xmlns="http://www.w3.org/1999/xhtml">\u200B<a href="tel:505-503-4455" rel="noopener noreferrer" target="_blank">(505)503-4455</a>\u200B</p>'
);
});

it('should render strikethrough text correctly', () => {
const styleOptions = { markdownRespectCRLF: true };

expect(renderMarkdown(`~~strike text~~`, styleOptions, renderMarkdownOptions)).toBe(
'<p><del>strike text</del></p>'
'<p xmlns="http://www.w3.org/1999/xhtml"><del>strike text</del></p>'
);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import { type HTMLContentTransformMiddleware } from 'botframework-webchat-component';

import createCodeBlockCopyButtonMiddleware from './middleware/createCodeBlockCopyButtonMiddleware';
import createSanitizeMiddleware from './middleware/createSanitizeMiddleware';

export default function createHTMLContentTransformMiddleware(): readonly HTMLContentTransformMiddleware[] {
return Object.freeze([createCodeBlockCopyButtonMiddleware(), createSanitizeMiddleware()]);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { type HTMLContentTransformMiddleware } from 'botframework-webchat-component';

import codeBlockCopyButtonDocumentMod from '../private/codeBlockCopyButtonDocumentMod';

export default function createCodeBlockCopyButtonMiddleware(): HTMLContentTransformMiddleware {
return () => next => request =>
next(
Object.freeze({
...request,
documentFragment: codeBlockCopyButtonDocumentMod(request.documentFragment, {
codeBlockCopyButtonAltCopied: request.codeBlockCopyButtonAltCopied,
codeBlockCopyButtonAltCopy: request.codeBlockCopyButtonAltCopy,
codeBlockCopyButtonClassName: request.codeBlockCopyButtonClassName,
codeBlockCopyButtonTagName: request.codeBlockCopyButtonTagName
})
})
);
}
109 changes: 109 additions & 0 deletions packages/bundle/src/markdown/middleware/createSanitizeMiddleware.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import {
parseDocumentFragmentFromString,
serializeDocumentFragmentIntoString
} from 'botframework-webchat-component/internal';
import sanitizeHTML from 'sanitize-html';

const BASE_SANITIZE_HTML_OPTIONS = Object.freeze({
allowedAttributes: {
a: ['aria-label', 'class', 'href', 'name', 'rel', 'target'],
button: ['aria-label', 'class', 'type', 'value'],
img: ['alt', 'aria-label', 'class', 'src', 'title'],
pre: ['class'],
span: ['aria-label']
},
allowedSchemes: ['data', 'http', 'https', 'ftp', 'mailto', 'sip', 'tel'],
allowedTags: [
'a',
'b',
'blockquote',
'br',
'button',
'caption',
'code',
'del',
'div',
'em',
'h1',
'h2',
'h3',
'h4',
'h5',
'h6',
'hr',
'i',
'img',
'ins',
'li',
'nl',
'ol',
'p',
'pre',
's',
'span',
'strike',
'strong',
'table',
'tbody',
'td',
'tfoot',
'th',
'thead',
'tr',
'ul',

// Followings are for MathML elements, from https://developer.mozilla.org/en-US/docs/Web/MathML.
'annotation-xml',
'annotation',
'math',
'merror',
'mfrac',
'mi',
'mmultiscripts',
'mn',
'mo',
'mover',
'mpadded',
'mphantom',
'mprescripts',
'mroot',
'mrow',
'ms',
'mspace',
'msqrt',
'mstyle',
'msub',
'msubsup',
'msup',
'mtable',
'mtd',
'mtext',
'mtr',
'munder',
'munderover',
'semantics'
],
// Bug of https://github.com/apostrophecms/sanitize-html/issues/633.
// They should not remove `alt=""` even though it is empty.
nonBooleanAttributes: []
});

export default function createSanitizeMiddleware() {
return () => () => request => {
const { codeBlockCopyButtonTagName, documentFragment } = request;
const sanitizeHTMLOptions = {
...BASE_SANITIZE_HTML_OPTIONS,
allowedAttributes: {
...BASE_SANITIZE_HTML_OPTIONS.allowedAttributes,
[codeBlockCopyButtonTagName]: ['class', 'data-alt-copy', 'data-alt-copied', 'data-testid', 'data-value']
},
allowedTags: [...BASE_SANITIZE_HTML_OPTIONS.allowedTags, codeBlockCopyButtonTagName]
};

const htmlAfterBetterLink = serializeDocumentFragmentIntoString(documentFragment);

const htmlAfterSanitization = sanitizeHTML(htmlAfterBetterLink, sanitizeHTMLOptions);

return parseDocumentFragmentFromString(htmlAfterSanitization);
};
}
Loading

0 comments on commit 06bf029

Please sign in to comment.