Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults when writing to Buffer using UCS2 encoding #2457

Closed
adalinesimonian opened this issue Aug 20, 2015 · 29 comments
Closed

Segfaults when writing to Buffer using UCS2 encoding #2457

adalinesimonian opened this issue Aug 20, 2015 · 29 comments
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@adalinesimonian
Copy link

The following code causes a segfault when write is called:

var text = '1234567890123456789012345678901234567890123456789';
var len = Buffer.byteLength(text, 'ucs2');
var buf = new Buffer(len);
var bytesWritten = buf.write(text, 0, 'ucs2');

The odd thing about this bug is that it is intermittent, but consistent. Meaning, the code above always causes a segfault in my tests. However, when I change text to have an extra 0 at the end (12345678901234567890123456789012345678901234567890), it stops causing segfaults.

Even more confusing is that if I take that same piece of text with a 0 at the end, the variant that doesn't cause a segfault, and use it in the following code, it also always causes a segfault.

This code is also the code I've been using to test this bug. Basically, I've been calling this script with a bunch of sample text files, and the only encoding that ever causes a segfault is ucs2.

Test script:

var text = '12345678901234567890123456789012345678901234567890';

if (process.argv.length > 2) {
  var lastArg = process.argv[process.argv.length - 1];
  if (lastArg.indexOf('-') !== 0) {
    var fs = require('fs');
    text = fs.readFileSync(lastArg).toString();
  }
}

console.log('Text to write:', text);
console.log();
console.log(text.length + ' characters');

function write(encoding) {
  console.log();
  console.log('=== Writing using ' + encoding + ' encoding');
  console.log();

  try {

    var len = Buffer.byteLength(text, encoding);

    console.log('Required buffer length: ' + len);

    var buf = new Buffer(len);
    var bytesWritten = buf.write(text, 0, encoding);

    console.log('Wrote ' + bytesWritten + ' bytes');
    console.log(buf);

  } catch (err) {
    console.log('Error during write:', err);
  }
};

write('base64');
write('binary');
write('utf8');
write('hex');
write('ascii');
write('ucs2');

Example output:

Text to write: 12345678901234567890123456789012345678901234567890

50 characters

=== Writing using base64 encoding

Required buffer length: 37
Wrote 37 bytes
<Buffer d7 6d f8 e7 ae fc f7 4d 76 df 8e 7a ef cf 74 d7 6d f8 e7 ae fc f7 4d 76 df 8e 7a ef cf 74 d7 6d f8 e7 ae fc f7>

=== Writing using binary encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using utf8 encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using hex encoding

Required buffer length: 25
Wrote 25 bytes
<Buffer 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90 12 34 56 78 90>

=== Writing using ascii encoding

Required buffer length: 50
Wrote 50 bytes
<Buffer 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30>

=== Writing using ucs2 encoding

Required buffer length: 100
[1]    3798 segmentation fault (core dumped)  node main.js

Backtrace:

Full backtrace

#0  0x0000000000b6c595 in void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) ()
#1  0x000000000085124a in v8::String::Write(unsigned short*, int, int, int) const ()
#2  0x0000000000d9980a in node::StringBytes::Write(v8::Isolate*, char*, unsigned long, v8::Local<v8::Value>, node::encoding, int*) ()
#3  0x0000000000d74de8 in void node::Buffer::StringWrite<(node::encoding)3>(v8::FunctionCallbackInfo<v8::Value> const&) ()
#4  0x00000000008653f7 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()

This bug has acted very oddly, but yet I've managed to replicate it both on my current machine and on a fresh install of io.js on a separate computer. It affects versions 3.0.0 and 3.1.0, but I haven't tested any older versions.

@ChALkeR ChALkeR added the buffer Issues and PRs related to the buffer subsystem. label Aug 20, 2015
@ChALkeR
Copy link
Member

ChALkeR commented Aug 20, 2015

I can't reproduce it using either of those two supplied scripts for some reason.

@alexlamsl
Copy link

Couldn't reproduce with iojs 2.5.0 on Windows Server 2012 R2

However, on an unrelated note, when I made a typo with encoding, Buffer.byteLength() does not err:

Buffer.byteLength(text, 'ucs2');
// 98
Buffer.byteLength(text, 'usc2');
// 48

Whereas buf.write() would throw TypeError: Unknown encoding: usc2

@skomski
Copy link
Contributor

skomski commented Aug 20, 2015

Seems like a compiler optimization bug: I can only reproduce it on master with gcc (5.2.0) if I compile with -O3 not -O2 and with clang it works fine on -O3.

@ChALkeR
Copy link
Member

ChALkeR commented Aug 20, 2015

@vsimonian Did you build the binaries (using which you could reproduce the bug) yourself?
If yes, what was the compiler version and what were the options?

@bnoordhuis
Copy link
Member

Can someone post a stack trace (from gdb or lldb) of the crash?

@adalinesimonian
Copy link
Author

@ChALkeR I'm using binaries from the Nodesource Debian/Ubuntu repository. Should I give compiling io.js a try?

@skomski
Copy link
Contributor

skomski commented Aug 20, 2015

Only crashes with gcc -O3 hence this all can be fixed by specifying -fno-tree-loop-vectorize.
The problem is maybe undefined unaligned access which results with the vectorizer into a crash like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58039

* thread #1: tid = 19762, 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405, name = 'iojs', stop reason = invalid address (fault address: 0x0)
    frame #0: 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405
   1402       (chars >= static_cast<int>(kMinComplexMemCopy / sizeof(*dest)))) {
   1403     MemCopy(dest, src, chars * sizeof(*dest));
   1404   } else {
-> 1405     while (dest < limit) *dest++ = static_cast<sinkchar>(*src++);
   1406   }
   1407 }
   1408 
(lldb) bt
* thread #1: tid = 19762, 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405, name = 'iojs', stop reason = invalid address (fault address: 0x0)
  * frame #0: 0x0000000000b2e497 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) + 338 at utils.h:1405
    frame #1: 0x0000000000b2e345 iojs`void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int) [inlined] void v8::internal::CopyChars<unsigned char, unsigned short>(chars=<unavailable>, src=<unavailable>, dest=<unavailable>) at utils.h:1387
    frame #2: 0x0000000000b2e345 iojs`void v8::internal::String::WriteToFlat<unsigned short>(src=0x00003b639ecb8da3, sink=<unavailable>, f=0, t=18667829) + 69 at objects.cc:9082
    frame #3: 0x0000000000843f37 iojs`v8::String::Write(unsigned short*, int, int, int) const + 169 at api.cc:5087
    frame #4: 0x0000000000843e8e iojs`v8::String::Write(this=<unavailable>, buffer=0x00000000011cd90b, start=0, length=40, options=11) const + 46 at api.cc:5108
    frame #5: 0x0000000000d38121 iojs`node::StringBytes::Write(isolate=<unavailable>, buf=0x00000000011cd90b, buflen=80, val=(val_ = <parent has invalid value.>), encoding=UCS2, chars_written=0x0000000000000000) + 321 at string_bytes.cc:350
    frame #6: 0x0000000000d10ece iojs`node::Buffer::Ucs2Write(v8::FunctionCallbackInfo<v8::Value> const&) + 622 at node_buffer.cc:681
    frame #7: 0x0000000000d10c60 iojs`node::Buffer::Ucs2Write(args=0x00007fffffffd800) + 160 at node_buffer.cc:707
    frame #8: 0x000000000085979b iojs`v8::internal::FunctionCallbackArguments::Call(this=0x00007fffffffd890, f=0x0000000000d10bc0)(v8::FunctionCallbackInfo<v8::Value> const&)) + 155 at arguments.cc:33

@indutny
Copy link
Member

indutny commented Aug 20, 2015

@bnoordhuis could be alignment problem?

@kzc
Copy link

kzc commented Aug 20, 2015

char* buf arg is not aligned: buf=0x00000000011cd90b

    frame #5: 0x0000000000d38121 iojs`node::StringBytes::Write(isolate=<unavailable>, buf=0x00000000011cd90b, buflen=80, val=(val_ = <parent has invalid value.>), encoding=UCS2, chars_written=0x0000000000000000) + 321 at string_bytes.cc:350

src/string_bytes.cc:

case UCS2: {
  uint16_t* const dst = reinterpret_cast<uint16_t*>(buf);  // can't safely do this

https://github.com/nodejs/node/blob/master/src/string_bytes.cc#L337

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Yay, bullseye!

@kzc may I ask you to give a try to this patch?

diff --git a/src/string_bytes.cc b/src/string_bytes.cc
index 0abdbf8..03cfd96 100644
--- a/src/string_bytes.cc
+++ b/src/string_bytes.cc
@@ -340,8 +340,20 @@ size_t StringBytes::Write(Isolate* isolate,
         memcpy(buf, data, nbytes);
         nchars = nbytes / sizeof(*dst);
       } else {
-        nchars = buflen / sizeof(*dst);
-        nchars = str->Write(dst, 0, nchars, flags);
+        // Unaligned `dst`
+        if (reinterpret_cast<intptr_t>(dst) & 1) {
+          uint16_t tmp;
+          nchars = str->Write(&tmp, 0, 1, flags);
+
+          if (nchars != 0) {
+            nchars = (buflen / sizeof(*dst)) - nchars;
+            dst[0] = tmp;
+            nchars = str->Write(dst + 1, 1, nchars, flags) + 1;
+          }
+        } else {
+          nchars = buflen / sizeof(*dst);
+          nchars = str->Write(dst, 0, nchars, flags);
+        }
         nbytes = nchars * sizeof(*dst);
       }
       if (IsBigEndian()) {

@kzc
Copy link

kzc commented Aug 20, 2015

@indutny - I didn't run it. Just inspected the source code after looking at the stack trace.

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Oh, right! cc @skomski

@kzc
Copy link

kzc commented Aug 20, 2015

regarding:

+        if (reinterpret_cast<intptr_t>(dst) & 2) {

did you mean:

+        if (reinterpret_cast<intptr_t>(dst) & 1) {

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Yikes, sure! Thanks for pointing out.

@kzc
Copy link

kzc commented Aug 20, 2015

+            nchars = str->Write(dst + 1, 1, nchars, flags) + 1;

wouldn't dst + 1 also be unaligned? Perhaps you want buf + 1?

The nchars logic may need some review.

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Arrgh... You are right. Let me think about the best way to do it...

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Ok, here is a revised patch:

diff --git a/src/string_bytes.cc b/src/string_bytes.cc
index 0abdbf8..138b302 100644
--- a/src/string_bytes.cc
+++ b/src/string_bytes.cc
@@ -340,8 +340,27 @@ size_t StringBytes::Write(Isolate* isolate,
         memcpy(buf, data, nbytes);
         nchars = nbytes / sizeof(*dst);
       } else {
-        nchars = buflen / sizeof(*dst);
-        nchars = str->Write(dst, 0, nchars, flags);
+        // Unaligned `dst`
+        if (reinterpret_cast<intptr_t>(dst) & 1) {
+          uint16_t tmp;
+          nchars = str->Write(&tmp, 0, 1, flags);
+
+          if (nchars != 0) {
+            nchars = (buflen / sizeof(*dst)) - nchars;
+            uint8_t* aligned_dst = reinterpret_cast<uint8_t*>(dst) + 1;
+            nchars = str->Write(reinterpret_cast<uint16_t*>(aligned_dst),
+                                1,
+                                nchars,
+                                flags);
+            memmove(aligned_dst + 1, aligned_dst, nchars * 2);
+
+            dst[0] = tmp;
+            nchars++;
+          }
+        } else {
+          nchars = buflen / sizeof(*dst);
+          nchars = str->Write(dst, 0, nchars, flags);
+        }
         nbytes = nchars * sizeof(*dst);
       }
       if (IsBigEndian()) {

@skomski please give it a try.

@skomski
Copy link
Contributor

skomski commented Aug 20, 2015

@indutny New patch works for both and original test cases:

text = '1 2 3 4 5 6 7 8 9 1 2'
Buffer(42).write(text, 0, 'ucs2')
text = '12345678901234567890123';
Buffer(44).write(text, 0, 'ucs2');

@indutny
Copy link
Member

indutny commented Aug 20, 2015

Hooray! ;) @bnoordhuis any comments, suggestions? If none - I'm going to fix the style and send a PR.

@kzc
Copy link

kzc commented Aug 20, 2015

+            dst[0] = tmp;

isn't this still unaligned? It could work on x86 architecture, but not necessarily others.

@indutny
Copy link
Member

indutny commented Aug 20, 2015

@kzc this is correct... I guess I should do a memcpy instead.

@bnoordhuis
Copy link
Member

@indutny I fixed a similar bug a while ago in commits 535fec8 and 56fde66, but for StringBytes::Encode(). There's probably a way to factor out the common code. You may have to take endianness into account.

@kzc
Copy link

kzc commented Aug 20, 2015

@indutny - the logic regarding nchars and the memmove seems not quite right to me, but I may be mistaken. I think you'll need a number of test cases for unaligned buf UCS2 string conversions of varying lengths to prove it is correct.

@indutny
Copy link
Member

indutny commented Aug 21, 2015

Here is the fix: #2480 cc @kzc @bnoordhuis

indutny added a commit to indutny/io.js that referenced this issue Aug 21, 2015
Support unaligned output buffer when writing out UCS2 in
`StringBytes::Write`.

Fix: nodejs#2457
@adalinesimonian
Copy link
Author

Not sure how much help this may be, but building io.js 3.1.0 with

'-DNODE_ARCH="x64"' '-DNODE_PLATFORM="linux"' '-DNODE_V8_OPTIONS=""' '-DNODE_WANT_INTERNALS=1' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-D_POSIX_C_SOURCE=200112' -pthread -Wall -Wextra -Wno-unused-parameter -m64 -O3 -ffunction-sections -fdata-sections -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++0x

results in the code triggering a segfault, but building with @indutny's latest version of patch #2480 with the same options fixes the segfault bug.

@indutny
Copy link
Member

indutny commented Aug 21, 2015

Thank you @vsimonian !

@shernshiou
Copy link

+1

indutny added a commit that referenced this issue Aug 21, 2015
Support unaligned output buffer when writing out UCS2 in
`StringBytes::Write`.

Fix: #2457
PR-URL: #2480
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@indutny
Copy link
Member

indutny commented Aug 21, 2015

Fixed!

@indutny indutny closed this as completed Aug 21, 2015
@adalinesimonian
Copy link
Author

Fantastic work, @indutny! Thank you! 👍

indutny added a commit that referenced this issue Aug 22, 2015
Support unaligned output buffer when writing out UCS2 in
`StringBytes::Write`.

Fix: #2457
PR-URL: #2480
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

No branches or pull requests

8 participants