Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffers: speed up swap16/32, add swap64 #7157

Closed
wants to merge 1 commit into from

Conversation

zbjornson
Copy link
Contributor

Checklist
  • tests and code linting passes
  • a test and/or benchmark is included
  • documentation is changed or added
  • the commit message follows commit guidelines
Affected core subsystem(s)

Buffer (swap16 and swap32 methods)

Description of change
  • Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain. Drop transition point between JS and C++ implementations accordingly.
  • Fix tests: the existing C++ impl tests were testing 0-filled buffers so were always passing.
  • Add similar buffer.swap64 method. (Use case: working with Float64Arrays, which I do routinely.)

Tested with gcc/g++ 4.8.3, clang 3.4 and MSVC 2015 (i.e. something like #4284 shouldn't happen here). Also tested the macro definitions for the #else and __linux__ cases on Ubuntu. I don't know if there's a non-gcc, non-clang linux compiler we need to support; if not, then the __linux__ case could be removed.

If/when this goes in, I plan to try using the new macros to speed up the buffer.read/write* methods.

Benchmark Results

16 bit types

buf len New Native JS Old Native New:Old Native
4 5,353,146 9,898,898 5,141,171 1.04
64 5,426,600 6,910,656 4,822,427 1.13
128 5,315,212 5,492,487 3,941,947 1.35
256 5,520,805 3,756,889 3,841,683 1.44
512 5,017,195 2,386,928 3,006,634 1.67
768 4,766,404 1,721,131 2,483,660 1.92
1024 4,603,591 1,352,982 2,103,924 2.19
1536 4,061,022 955,204 1,648,238 2.46
2056 3,968,697 728,929 1,331,039 2.98
4096 3,338,060 378,810 765,637 4.36
8192 2,476,686 195,892 413,136 5.99

32 bit types

buf len New Native JS Old Native New:Old Native
4 5,464,417 9,933,574 5,348,619 1.02
64 5,427,788 7,693,327 4,972,264 1.09
128 5,408,238 5,794,946 4,546,107 1.19
256 5,188,293 4,276,777 3,777,760 1.37
512 5,022,584 2,792,160 3,068,184 1.64
768 4,872,416 2,096,890 2,518,356 1.93
1024 4,694,551 1,668,602 2,125,038 2.21
1536 4,262,433 1,177,335 1,647,717 2.59
2056 4,116,525 914,801 1,338,082 3.08
4096 3,283,537 483,214 770,439 4.26
8192 2,533,520 248,052 412,817 6.14

64 bit types

buf len New Native JS
8 5,640,873 10,097,787
64 5,486,161 7,891,069
128 5,403,555 6,619,267
256 5,296,904 4,673,973
512 5,124,202 3,098,105
768 4,522,887 2,334,265
1024 4,605,491 1,867,182
1536 4,290,413 1,356,988
2056 3,972,674 1,054,184
4096 3,370,964 562,251
8192 2,478,062 288,762

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Jun 5, 2016
method: ['swap16', 'swap32', 'htons', 'htonl'],
len: [4, 64, 512, 768, 1024, 1536, 2056, 4096, 8192],
n: [1e6]
method: ['swap16', 'swap32', 'swap64'/*, 'htons', 'htonl', 'htonll'*/],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is commenting these out intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, see comment a few lines down. The commented-out ones are only for picking the crossover point between the JS and C++ implementations, and make the benchmarks super slow.

@mscdex
Copy link
Contributor

mscdex commented Jun 5, 2016

/cc @nodejs/buffer

((x & 0x00000000FF000000ull) << 8) | \
((x & 0x0000000000FF0000ull) << 24) | \
((x & 0x000000000000FF00ull) << 40) | \
((x & 0x00000000000000FFull) << 56)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you wrap the xes in the macros in parentheses, like (x)? I don’t think it makes a difference right now, it’s just a good practice to do that.

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Jun 5, 2016
```

### buf.swap64()
<!-- YAML
added: v6.x.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using added: REPLACEME will lead the person doing the release that contains this feature to fill in the correct version.

@zbjornson
Copy link
Contributor Author

Made all changes suggested so far, thanks. (@trevnorris I also added CHECK_EQ(ts_obj_length % 2) for the Swap16 method -- these should be redundant with what's in lib/buffer.js but no harm having both.)

});

// The htons and htonl methods below are used to benchmark the
// performance difference between doing the byteswap in pure
// javascript regardless of Buffer size as opposed to dropping
// down to the native layer for larger Buffer sizes.
// down to the native layer for larger Buffer sizes. Commented
// out by default because they are slow for big buffers. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the TODO would indicate. The JS methods are copy/pastes of what's in lib/buffer.js and are only for evaluating where to set the crossover point between JS and native implementations. LMK if you had something else in mind...

@jasnell
Copy link
Member

jasnell commented Jun 6, 2016

LGTM with a couple nits. Would like to make sure @trevnorris, @addaleax and @bnoordhuis are happy with it also

(((x) & 0xFF00) << 8) | \
(((x) >> 8) & 0xFF00) | \
(((x) >> 24) & 0xFF)
#define BSWAP_INTRINSIC_8(x) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: check alignment of \ at the end.

@zbjornson zbjornson force-pushed the buffer-swap-perf branch 2 times, most recently from ad6f206 to adb07aa Compare June 15, 2016 03:57
@zbjornson
Copy link
Contributor Author

(Safe impls and tests for unaligned buffers added. Pending new benchmarks for aligned v. unaligned, and benchmarking std::swap vs the macro that was there previously.)

swap(this, i, i + 1);
return this;
}
return swap16n.apply(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this on the original swap pr. no need to use .apply(). just pass this as an argument and access it via args[0] in c++. rest of the code does this.

@trevnorris
Copy link
Contributor

Mind applying this patch to the PR?

diff --git a/benchmark/buffers/buffer-swap.js b/benchmark/buffers/buffer-swap.js
index 0be334d..dfc2158 100644
--- a/benchmark/buffers/buffer-swap.js
+++ b/benchmark/buffers/buffer-swap.js
@@ -1,8 +1,10 @@
 'use strict';

 const common = require('../common.js');
+const v8 = require('v8');

 const bench = common.createBenchmark(main, {
+  aligned: ['true'],
   method: ['swap16', 'swap32', 'swap64'/*, 'htons', 'htonl', 'htonll'*/],
   len: [8, 64, 128, 256, 512, 768, 1024, 1536, 2056, 4096, 8192],
   n: [5e7]
@@ -54,24 +56,35 @@ Buffer.prototype.htonll = function htonl() {
   return this;
 };

-function createBuffer(len) {
+function createBuffer(len, aligned) {
+  len += aligned ? 0 : 1;
   const buf = Buffer.allocUnsafe(len);
   for (var i = 1; i <= len; i++)
     buf[i - 1] = i;
-  return buf;
+  return aligned ? buf : buf.slice(1);
 }

-function bufferSwap(n, buf, method) {
-  for (var i = 1; i <= n; i++)
-    buf[method]();
+function genMethod(method) {
+  const fnString =
+      'return function ' + method + '(n, buf) {' +
+      '  for (var i = 0; i <= n; i++)' +
+      '    buf.' + method + '();' +
+      '}';
+  return (new Function(fnString))();
 }

 function main(conf) {
   const method = conf.method;
   const len = conf.len | 0;
   const n = conf.n | 0;
-  const buf = createBuffer(len);
+  const aligned = conf.aligned | 'true';
+  const buf = createBuffer(len, aligned === 'true');
+  const bufferSwap = genMethod(method);
+
+  v8.setFlagsFromString('--allow_natives_syntax');
+  eval('%OptimizeFunctionOnNextCall(bufferSwap)');
+
   bench.start();
-  bufferSwap(n, buf, method);
+  bufferSwap(n, buf);
   bench.end(n);
 }

First thing is the optimization of the actual function call. Here's the before and after:

buffers/buffer-swap.js method="swap16" len="8" n="50000000":  6450503.12454
buffers/buffer-swap.js method="swap16" len="8" n="50000000": 86301409.68934

Second thing is adding "aligned" option. Here's the difference when set to true vs false:

buffers/buffer-swap.js aligned="true" method="swap32" len="4096" n="50000000": 1335738.09986
buffers/buffer-swap.js aligned="false" method="swap32" len="4096" n="50000000": 477541.25475

@zbjornson
Copy link
Contributor Author

Done, thanks.

Somehow I'm no longer seeing the performance gain for aligned buffers... 6.2.0 and this are comparable. 😕 Need to dig to see what's going on.

@trevnorris
Copy link
Contributor

I get the following lint errors:

node/benchmark/buffers/buffer-swap.js
  29:19  error  'n' is defined but never used  no-unused-vars
  38:19  error  'n' is defined but never used  no-unused-vars
  48:19  error  'n' is defined but never used  no-unused-vars

✖ 3 problems (3 errors, 0 warnings)

[00:03|% 58|+   394|-     1]: test     
node/test/parallel/test-buffer-swap.js
  11:1  error  Line 11 exceeds the maximum line length of 80  max-len
  12:1  error  Line 12 exceeds the maximum line length of 80  max-len
  17:1  error  Line 17 exceeds the maximum line length of 80  max-len
  18:1  error  Line 18 exceeds the maximum line length of 80  max-len
  23:1  error  Line 23 exceeds the maximum line length of 80  max-len
  24:1  error  Line 24 exceeds the maximum line length of 80  max-len
  35:1  error  Line 35 exceeds the maximum line length of 80  max-len
  36:1  error  Line 36 exceeds the maximum line length of 80  max-len
  37:1  error  Line 37 exceeds the maximum line length of 80  max-len

✖ 9 problems (9 errors, 0 warnings)

Looking at the performance part now.

@zbjornson
Copy link
Contributor Author

<algorithm> vs <utility>?

@trevnorris
Copy link
Contributor

@zbjornson std::swap() used to be in <algorithm> but is now in <utility> (http://en.cppreference.com/w/cpp/algorithm/swap). though our cpplint.py is ancient and doesn't know this, so gives a warning.

trevnorris pushed a commit that referenced this pull request Jun 27, 2016
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain.
  Drop transition point between JS and C++ implementations accordingly.
  Amount of performance improvement not only depends on buffer size but
  also memory alignment.
* Fix tests: C++ impl tests were testing 0-filled buffers so were
  always passing.
* Add similar buffer.swap64 method.
* Make buffer-swap benchmark mirror JS impl.

doc/api/buffer.markdown has an entry of "added: REPLACEME" that should
be changed to the correct release number before tagged.

Because node is currently using a very old version of cpplint.py it
doesn't know that std::swap() has moved from <algorithm> to <utility> in
c++11. So until cpplint.py is updated simply NOLINT the line.
Technically it should be NOLINT(build/include_what_you_use), but that
puts the line over 80 characters causing another lint error.

PR-URL: #7157
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@trevnorris
Copy link
Contributor

Thanks much for the hard work. Landed in a1059af with minor adjustments and lint fixes.

@trevnorris trevnorris closed this Jun 27, 2016
@addaleax
Copy link
Member

This is semver-minor, right?

@addaleax addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label Jun 27, 2016
@jasnell
Copy link
Member

jasnell commented Jun 27, 2016

Yep. Semver minor
On Jun 27, 2016 1:42 PM, "Anna Henningsen" notifications@github.com wrote:

This is semver-minor, right?


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#7157 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAa2eYpl6HcI2gwk12FhnkvsiKehJ4Oyks5qQDWigaJpZM4IuQW6
.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Jul 5, 2016

Does not cleanly apply on v6.

I think this depends on #7082? (or at least I think the conflict comes from there)

@addaleax
Copy link
Member

addaleax commented Jul 5, 2016

I think this depends on #7082? (or at least I think the conflict comes from there)

The conflict comes from there, although it’s only because of neighbouring lines being changed. There should be no logical dependency here; if it helps, I can do a quick backport PR.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Jul 5, 2016

@addaleax If you can post what the patch should look like on v6 I can probably do the resolution.

@Fishrock123
Copy link
Contributor

Or a PR may be easier, idk. Either or.

addaleax pushed a commit to addaleax/node that referenced this pull request Jul 5, 2016
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain.
  Drop transition point between JS and C++ implementations accordingly.
  Amount of performance improvement not only depends on buffer size but
  also memory alignment.
* Fix tests: C++ impl tests were testing 0-filled buffers so were
  always passing.
* Add similar buffer.swap64 method.
* Make buffer-swap benchmark mirror JS impl.

doc/api/buffer.markdown has an entry of "added: REPLACEME" that should
be changed to the correct release number before tagged.

Because node is currently using a very old version of cpplint.py it
doesn't know that std::swap() has moved from <algorithm> to <utility> in
c++11. So until cpplint.py is updated simply NOLINT the line.
Technically it should be NOLINT(build/include_what_you_use), but that
puts the line over 80 characters causing another lint error.

PR-URL: nodejs#7157
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Fishrock123 pushed a commit that referenced this pull request Jul 5, 2016
* Speed up buffer.swap16 and swap32 by using builtins. Up to ~6x gain.
  Drop transition point between JS and C++ implementations accordingly.
  Amount of performance improvement not only depends on buffer size but
  also memory alignment.
* Fix tests: C++ impl tests were testing 0-filled buffers so were
  always passing.
* Add similar buffer.swap64 method.
* Make buffer-swap benchmark mirror JS impl.

doc/api/buffer.markdown has an entry of "added: REPLACEME" that should
be changed to the correct release number before tagged.

Because node is currently using a very old version of cpplint.py it
doesn't know that std::swap() has moved from <algorithm> to <utility> in
c++11. So until cpplint.py is updated simply NOLINT the line.
Technically it should be NOLINT(build/include_what_you_use), but that
puts the line over 80 characters causing another lint error.

PR-URL: #7157
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-URL: #7546
@Fishrock123 Fishrock123 mentioned this pull request Jul 5, 2016
Fishrock123 added a commit that referenced this pull request Jul 6, 2016
Notable changes:

* buffer: Added `buffer.swap64()` to compliment `swap16()` &
`swap32()`. (Zach Bjornson) #7157
* build: New `configure` options have been added for building Node.js
as a shared library. (Stefan Budeanu)
#6994
  - The options are: `--shared`, `--without-v8-platform` &
`--without-bundled-v8`.
* crypto: Root certificates have been updated. (Ben Noordhuis)
#7363
* debugger: The server address is now configurable via
`--debug=<address>:<port>`. (Ben Noordhuis)
#3316
* npm: Upgraded npm to v3.10.3 (Kat Marchán)
#7515 & (Rebecca Turner)
#7410
* readline: Added the `prompt` option to the readline constructor.
(Evan Lucas) #7125
* repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops
without stopping the Node.js instance. (Anna Henningsen)
#6635
* src:
  - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao)
#3098
  - Refactored `require('constants')`, constants are now available
directly from their respective modules. (James M Snell)
#6534
* stream: Improved `readable.read()` performance by up to 70%. (Brian
White) #7077
* timers: `setImmediate()` is now up to 150% faster in some situations.
(Andras) #6436
* util: Added a `breakLength` option to `util.inspect()` to control how
objects are formatted across lines. (cjihrig)
#7499
* v8-inspector: Experimental support has been added for debugging
Node.js over the inspector protocol. (Ali Ijaz Sheikh)
#6792
  - *Note: This feature is experimental, and it could be altered or
removed.*
  - You can try this feature by running Node.js with the `--inspect`
flag.

Refs: #7441
PR-URL: #7550
Fishrock123 added a commit that referenced this pull request Jul 6, 2016
Notable changes:

* buffer: Added `buffer.swap64()` to compliment `swap16()` &
`swap32()`. (Zach Bjornson) #7157
* build: New `configure` options have been added for building Node.js
as a shared library. (Stefan Budeanu)
#6994
  - The options are: `--shared`, `--without-v8-platform` &
`--without-bundled-v8`.
* crypto: Root certificates have been updated. (Ben Noordhuis)
#7363
* debugger: The server address is now configurable via
`--debug=<address>:<port>`. (Ben Noordhuis)
#3316
* npm: Upgraded npm to v3.10.3 (Kat Marchán)
#7515 & (Rebecca Turner)
#7410
* readline: Added the `prompt` option to the readline constructor.
(Evan Lucas) #7125
* repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops
without stopping the Node.js instance. (Anna Henningsen)
#6635
* src:
  - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao)
#3098
  - Refactored `require('constants')`, constants are now available
directly from their respective modules. (James M Snell)
#6534
* stream: Improved `readable.read()` performance by up to 70%. (Brian
White) #7077
* timers: `setImmediate()` is now up to 150% faster in some situations.
(Andras) #6436
* util: Added a `breakLength` option to `util.inspect()` to control how
objects are formatted across lines. (cjihrig)
#7499
* v8-inspector: Experimental support has been added for debugging
Node.js over the inspector protocol. (Ali Ijaz Sheikh)
#6792
  - *Note: This feature is experimental, and it could be altered or
removed.*
  - You can try this feature by running Node.js with the `--inspect`
flag.

Refs: #7441
PR-URL: #7550
Fishrock123 added a commit that referenced this pull request Jul 6, 2016
Notable changes:

* buffer: Added `buffer.swap64()` to compliment `swap16()` &
`swap32()`. (Zach Bjornson) #7157
* build: New `configure` options have been added for building Node.js
as a shared library. (Stefan Budeanu)
#6994
  - The options are: `--shared`, `--without-v8-platform` &
`--without-bundled-v8`.
* crypto: Root certificates have been updated. (Ben Noordhuis)
#7363
* debugger: The server address is now configurable via
`--debug=<address>:<port>`. (Ben Noordhuis)
#3316
* npm: Upgraded npm to v3.10.3 (Kat Marchán)
#7515 & (Rebecca Turner)
#7410
* readline: Added the `prompt` option to the readline constructor.
(Evan Lucas) #7125
* repl / vm: `sigint`/`ctrl+c` will now break out of infinite loops
without stopping the Node.js instance. (Anna Henningsen)
#6635
* src:
  - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao)
#3098
  - Refactored `require('constants')`, constants are now available
directly from their respective modules. (James M Snell)
#6534
* stream: Improved `readable.read()` performance by up to 70%. (Brian
White) #7077
* timers: `setImmediate()` is now up to 150% faster in some situations.
(Andras) #6436
* util: Added a `breakLength` option to `util.inspect()` to control how
objects are formatted across lines. (cjihrig)
#7499
* v8-inspector: Experimental support has been added for debugging
Node.js over the inspector protocol. (Ali Ijaz Sheikh)
#6792
  - *Note: This feature is experimental, and it could be altered or
removed.*
  - You can try this feature by running Node.js with the `--inspect`
flag.

Refs: #7441
PR-URL: #7550
lukesampson pushed a commit to ScoopInstaller/Scoop that referenced this pull request Jul 7, 2016
### Notable changes

* **buffer**: Added `buffer.swap64()` to compliment `swap16()` & `swap32()`. (Zach Bjornson) [#7157](nodejs/node#7157)
* **build**: New `configure` options have been added for building Node.js as a shared library. (Stefan Budeanu) [#6994](nodejs/node#6994)
  - The options are: `--shared`, `--without-v8-platform` & `--without-bundled-v8`.
* **crypto**: Root certificates have been updated. (Ben Noordhuis) [#7363](nodejs/node#7363)
* **debugger**: The server address is now configurable via `--debug=<address>:<port>`. (Ben Noordhuis) [#3316](nodejs/node#3316)
* **npm**: Upgraded npm to v3.10.3 (Kat Marchán) [#7515](nodejs/node#7515) & (Rebecca Turner) [#7410](nodejs/node#7410)
* **readline**: Added the `prompt` option to the readline constructor. (Evan Lucas) [#7125](nodejs/node#7125)
* **repl / vm**: `sigint`/`ctrl+c` will now break out of infinite loops without stopping the Node.js instance. (Anna Henningsen) [#6635](nodejs/node#6635)
* **src**:
  - Added a `node::FreeEnvironment` public C++ API. (Cheng Zhao) [#3098](nodejs/node#3098)
  - Refactored `require('constants')`, constants are now available directly from their respective modules. (James M Snell) [#6534](nodejs/node#6534)
* **stream**: Improved `readable.read()` performance by up to 70%. (Brian White) [#7077](nodejs/node#7077)
* **timers**: `setImmediate()` is now up to 150% faster in some situations. (Andras) [#6436](nodejs/node#6436)
* **util**: Added a `breakLength` option to `util.inspect()` to control how objects are formatted across lines. (cjihrig) [#7499](nodejs/node#7499)
* **v8-inspector**: Experimental support has been added for debugging Node.js over the inspector protocol. (Ali Ijaz Sheikh) [#6792](nodejs/node#6792)
  - **Note: This feature is _experimental_, and it could be altered or removed.**
  - You can try this feature by running Node.js with the `--inspect` flag.
bnoordhuis added a commit to bnoordhuis/io.js that referenced this pull request Jul 11, 2016
Said builtins are not supported by older versions of apple-gcc, breaking
the build on OS X 10.8.

Fixes: nodejs#7618
Refs: nodejs#4290
Refs: nodejs#7157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. performance Issues and PRs related to the performance of Node.js. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants