Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: renderToPipeableStream() emit mysterious mojibake whitespace chars in the result #24985

Closed
tetsuharuohzeki opened this issue Jul 25, 2022 · 7 comments · Fixed by #26228
Closed
Labels
Status: Unconfirmed A potential issue that we haven't yet confirmed as a bug

Comments

@tetsuharuohzeki
Copy link

tetsuharuohzeki commented Jul 25, 2022

renderToPipeableStream() emit mojibake whitespace chars in its result.
This would sometimes breaks the final generated html.

React version: 18.2.0

Steps To Reproduce

Run this test case with jest.

import { PassThrough } from 'node:stream';
import { renderToPipeableStream } from 'react-dom/server';

const element = (
    <html lang="ja">
        <head>
            <meta charSet="utf-8" />
            <title>吾輩は猫である</title>
        </head>
        <body>
            <p>
                どこで生れたかとんと見当がつかぬ。何でも薄暗いじめじめした所でニャーニャー泣いていた事だけは記憶している。吾輩はここで始めて人間というものを見た。しかもあとで聞くとそれは書生という人間中で一番獰悪な種族であったそうだ。この書生というのは時々我々を捕えて煮て食うという話である。しかしその当時は何という考もなかったから別段恐しいとも思わなかった。ただ彼の掌に載せられてスーと持ち上げられた時何だかフワフワした感じがあったばかりである。掌の上で少し落ちついて書生の顔を見たのがいわゆる人間というものの見始であろう。この時妙なものだと思った感じが今でも残っている。第一毛をもって装飾されべきはずの顔がつるつるしてまるで薬缶だ。その後猫にもだいぶ逢ったがこんな片輪には一度も出会わした事がない。のみならず顔の真中があまりに突起している。そうしてその穴の中から時々ぷうぷうと煙を吹く。どうも咽せぽくて実に弱った。これが人間の飲む煙草というものである事はようやくこの頃知った。
            </p>
            <p>
                この書生の掌の裏でしばらくはよい心持に坐っておったが、しばらくすると非常な速力で運転し始めた。書生が動くのか自分だけが動くのか分らないが無暗に眼が廻る。胸が悪くなる。到底助からないと思っていると、どさりと音がして眼から火が出た。それまでは記憶しているがあとは何の事やらいくら考え出そうとしても分らない。
            </p>
            <p>
                ふと気が付いて見ると書生はいない。たくさんおった兄弟が一疋も見えぬ。肝心の母親さえ姿を隠してしまった。その上今までの所とは違って無暗に明るい。眼を明いていられぬくらいだ。はてな何でも容子がおかしいと、のそのそ這い出して見ると非常に痛い。吾輩は藁の上から急に笹原の中へ棄てられたのである。
            </p>
        </body>
    </html>
);

test('renderToPipeableStream', async () => {
    const onStreamReady = new Promise((resolve, reject) => {
        const stream = renderToPipeableStream(element, {
            onAllReady() {
                resolve(stream);
            },
            onError(err) {
                reject(err);
            },
        });
    });

    const renderingStream = await onStreamReady;

    const sink = new PassThrough();
    const connected = renderingStream.pipe(sink);

    const result = [];
    for await (const chunk of connected) {
        if (typeof chunk === 'string') {
            result.push(chunk);
        } else {
            const txt = String(chunk);
            result.push(txt);
        }
    }

    const actual = result.join('');
    expect(actual).toMatchSnapshot();
});

The current behavior

The above testcase generate this result:

"<!DOCTYPE html><html lang="ja"><head><meta charSet="utf-8"/><title>吾輩は猫である</title></head><body><p>どこで生れたかとんと見当がつかぬ。何でも薄暗いじめじめした所でニャーニャー泣いていた事だけは記憶している。吾輩はここで始めて人間というものを見た。しかもあとで聞くとそれは書生という人間中で一番獰悪な種族であったそうだ。この書生というのは時々我々を捕えて煮て食うという話である。しかしその当時は何という考もなかったから別段恐しいとも思わなかった。ただ彼の掌に載せられてスーと持ち上げられた時何だかフワフワした感じがあったばかりである。掌の上で少し落ちついて書生の顔を見たのがいわゆる人間というものの見始であろう。この時妙なものだと思った感じが今でも残っている。第一毛をもって装飾されべきはずの顔がつるつるしてまるで薬缶だ。その後猫にもだいぶ逢ったがこんな片輪には一度も出会わした事がない。のみならず顔の真中があまりに突起している。そうしてその穴の中から時々ぷうぷうと煙を吹く。どうも咽せぽくて実に弱った。これが人間の飲む煙草というものである事はようやくこの頃知った。</p><p>この書生の掌の裏でしばらくはよい心持に坐っておったが、しばらくすると非常な速力で運転し始めた。書生が動くのか自分だけが動くのか分らないが無暗に眼が廻る。胸が悪くなる。到底助からないと思っていると、どさりと音がして眼から火が出た。それまでは記憶しているがあとは何の事やらいくら考え出そうとしても分らない。</p><p>ふと気が付いて見ると書生はいない。たくさんおった兄弟が一疋も見えぬ。肝心の母親さえ姿を隠してしまった。その上今まで��の所とは違って無暗に明るい。眼を明いていられぬくらいだ。はてな何でも容子がおかしいと、のそのそ這い出して見ると非常に痛い。吾輩は藁の上から急に笹原の中へ棄てられたのである。</p></body></html>"

The expected behavior

The current behavior's result contains その上今まで��の所とは違って無暗に明るい.
But we expect this sentence should be その上今までの所とは違って無暗に明るい.

So renderToPipeableStream() should not emit mysterious whitespace chars.

@tetsuharuohzeki tetsuharuohzeki added the Status: Unconfirmed A potential issue that we haven't yet confirmed as a bug label Jul 25, 2022
@tetsuharuohzeki
Copy link
Author

I confirmed:

  • This bug is not reproducible with renderToStaticNodeStream().
  • This bug is not reproducible in v18.0.0.
    • But this bug is reproducble from v18.1.0.

So I guess some regressions happens in this range.
v18.0.0...v18.1.0

@tetsuharuohzeki
Copy link
Author

This bug may be related to #24592

@sophiebits
Copy link
Collaborator

Thanks for the clear repro case!

sophiebits added a commit to sophiebits/react that referenced this issue Feb 24, 2023
We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful.

Fixes facebook#24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries).
@sophiebits
Copy link
Collaborator

(I am guessing #24592 is not related, since this issue seems to be with non-ASCII characters and I don't believe that one is.)

sophiebits added a commit to sophiebits/react that referenced this issue Feb 24, 2023
We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful.

Fixes facebook#24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries).
sophiebits added a commit that referenced this issue Feb 24, 2023
We encode strings 2048 UTF-8 bytes at a time. If the string we are
encoding crosses to the next chunk but the current chunk doesn't fit an
integral number of characters, we need to make sure not to send the
whole buffer, only the bytes that are actually meaningful.

Fixes #24985. I was able to verify that this fixes the repro shared in
the issue (be careful when testing because the null bytes do not show
when printed to my terminal, at least). However, I don't see a clear way
to add a test for this that will be resilient to small changes in how we
encode the markup (since it depends on where specific multibyte
characters fall against the 2048-byte boundaries).
github-actions bot pushed a commit that referenced this issue Feb 24, 2023
We encode strings 2048 UTF-8 bytes at a time. If the string we are
encoding crosses to the next chunk but the current chunk doesn't fit an
integral number of characters, we need to make sure not to send the
whole buffer, only the bytes that are actually meaningful.

Fixes #24985. I was able to verify that this fixes the repro shared in
the issue (be careful when testing because the null bytes do not show
when printed to my terminal, at least). However, I don't see a clear way
to add a test for this that will be resilient to small changes in how we
encode the markup (since it depends on where specific multibyte
characters fall against the 2048-byte boundaries).

DiffTrain build for [96cdeaf](96cdeaf)
@tetsuharuohzeki
Copy link
Author

@sophiebits Thank you!

@hanayashiki
Copy link

For those who have this problem in remix, I can provide a fix in entry.server.tsx:

For quick code:

function handleBrowserRequest(
  request: Request,
  responseStatusCode: number,
  responseHeaders: Headers,
  remixContext: EntryContext,
) {
  return new Promise((resolve, reject) => {
    let shellRendered = false;
    const { pipe, abort } = renderToPipeableStream(
      <RemixServer
        context={remixContext}
        url={request.url}
        abortDelay={ABORT_DELAY}
      />,
      {
        onShellReady() {
          shellRendered = true;
          
          /**
           * https://github.com/facebook/react/pull/26228
           * Null bytes are possible at the end of the chunk, if we are using non-ascii characters.
           * The fix is not released to 18.2.0 so all we can do is transforming the chunks.
           */
          const body = new Transform({
            transform(chunk: Buffer, encoding, callback) {
              let endingZeroes =
                Number(chunk.at(-1) === 0) + Number(chunk.at(-2) === 0);

              callback(
                null,
                endingZeroes > 0
                  ? chunk.subarray(0, chunk.length - endingZeroes)
                  : chunk,
              );
            },
          });

          responseHeaders.set("Content-Type", "text/html");

          resolve(
            new Response(createReadableStreamFromReadable(body), {
              headers: responseHeaders,
              status: responseStatusCode,
            }),
          );

          pipe(body);
        },
        onShellError(error: unknown) {
          reject(error);
        },
        onError(error: unknown) {
          responseStatusCode = 500;
          // Log streaming rendering errors from inside the shell.  Don't log
          // errors encountered during initial shell rendering since they'll
          // reject and get logged in handleDocumentRequest.
          if (shellRendered) {
            console.error(error);
          }
        },
      },
    );

    setTimeout(abort, ABORT_DELAY);
  });
}

Details:

To fix the bug while not hurt the performance, we only need to check if the last two bytes are zeroes.

Because utf-8 is 3-byte at most and the bug is only triggered if they are filling a fixed length array which doesn't fit a 3-byte character at the end. For example, given they are encoding into a 8-byte array, AAの will be [0x41 0x41 0x41 0xe3 0x81 0xae 0x00 0x00], leaving the last two bytes empty. They will pass the entire 8-byte array downstream to the response instead of only the valid bytes. So we need to trim at most 2 bytes for this specific bug.

@tats-u
Copy link

tats-u commented Oct 20, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Unconfirmed A potential issue that we haven't yet confirmed as a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants