Skip to content

WritableStream: example, fixed block character in the output #14371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 31, 2022

Conversation

OnkarRuikar
Copy link
Contributor

@OnkarRuikar OnkarRuikar commented Mar 27, 2022

Summary

In the example https://developer.mozilla.org/en-US/docs/Web/API/WritableStream#examples ,
there are unwanted block characters in the output:
screenshot

In JavaScript strings use UTF-16 encoding.
With utf-16 encoding for the decoder, the issue is fixed:

const decoder = new TextDecoder("utf-16");

After:
fixed

Tested in Chrome on Windows 10.

Note: I've created a PR for the same in mdn/dom-examples repo: mdn/dom-examples#97
I have no idea who is the reviewer there.

Metadata

  • Fixes a typo, bug, or other error

@OnkarRuikar OnkarRuikar requested a review from a team as a code owner March 27, 2022 13:10
@OnkarRuikar OnkarRuikar requested review from sideshowbarker and removed request for a team March 27, 2022 13:10
@github-actions github-actions bot added the Content:WebAPI Web API docs label Mar 27, 2022
@github-actions

This comment was marked as outdated.

@sideshowbarker
Copy link
Member

In the example, there are unwanted block characters in the output
Tested in Chrome on Windows 10.

I can’t reproduce that in Chrome in my macOS environment — though I can reproduce it in Safari.

But regardless, it seems to me the cause may actually be browser bugs, not a problem with the code.

In JavaScript strings use UTF-16 encoding.

While that’s true about how JavaScript encodes strings internally, that’s not true at the API layer for TextDecoder.

With utf-16 encoding for the decoder, the issue is fixed:

const decoder = new TextDecoder("utf-16");

The code here is doing const encoder = new TextEncoder(). Per https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder and the Encoding spec, that means the encoder encodes the stream in UTF-8. (There is in fact no other way to use TextEncoder() to encode in anything other than UTF-8.)

So the const decoder = new TextDecoder("utf-8") part of the existing code causes the stream to be decoded using the UTF-8 decoder — which, since it was encoded in UTF-8, seems like it’s right. And so, doing const decoder = new TextDecoder("utf-16") would be wrong — because the stream was not encoded in UTF-16, it was encoded in UTF-8. (And In fact, just new TextDecoder() should work — because the UTF-8 decoder is the default).

@OnkarRuikar
Copy link
Contributor Author

OnkarRuikar commented Mar 28, 2022

@sideshowbarker
To be on a same page lets look at a common example https://jsfiddle.net/vx5ea9z1/2/
Here I've used the same JavaScript code as provided in the examples.
In addition, at the end of the list I am printing received message length. And also printed Extended ASCII characters, which include non printable characters as well.

I can’t reproduce that in Chrome in my macOS environment — though I can reproduce it in Safari.

Some browsers hide non-printable characters. Above JSfiddle on Windows in Chrome and Edge shows all extended ASCII chars. Non-printable characters are shown as empty boxes. Same JSFiddle on ipad Safari and Chrome doesn't show non printable characters.

But regardless, it seems to me the cause may actually be browser bugs, not a problem with the code.

We can easily verify if it's a a browser bug. In above JSFiddle, in the output, received message length is 26 in both Windows and iPadOs.
Chrome and Edge on Windows:
windows

Safari and Chrome on iPadOs:
safari

Can you check the same in Chrome on macOS?
If the length is 26 on all browsers then it's a code bug. Because original message has only 13 characters and received message has 26. How come the message size doubled? Then there is something wrong in decoding logic.

So the const decoder = new TextDecoder("utf-8") part of the existing code causes the stream to be decoded using the UTF-8 decoder — which, since it was encoded in UTF-8, seems like it’s right.

You are right. utf-8 encoded string should be decoded using utf-8 decoder. After debugging more it looks like following lines of decoding logic are making string double in size:

var buffer = new ArrayBuffer(2);
var view = new Uint16Array(buffer);
view[0] = chunk;
var decoded = decoder.decode(view, { stream: true });

By using Uint16Array, the view[1] remains empty0. And doubles the output size.
We can use Uint8Array to decode utf-8 encoded string.

var buffer = new ArrayBuffer(1);
var view = new Uint8Array(buffer);
view[0] = chunk;
var decoded = decoder.decode(view, { stream: true });

This solved the issue on windows and ipadOs. Let me know if it works ok on your end: https://jsfiddle.net/onqp6xvt/

@sideshowbarker sideshowbarker merged commit 88ecdef into mdn:main Mar 31, 2022
@OnkarRuikar OnkarRuikar deleted the patch-2 branch March 31, 2022 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content:WebAPI Web API docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants