From 0939edd4ed447ce55232df6d25dad60caa7d73c4 Mon Sep 17 00:00:00 2001 From: James M Snell Date: Tue, 11 Oct 2016 14:12:31 -0700 Subject: [PATCH] buffer: add buffer.transcode Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: https://github.com/nodejs/node/pull/8075 PR-URL: https://github.com/nodejs/node/pull/9038 Reviewed-By: Anna Henningsen --- doc/api/buffer.md | 27 +++ lib/buffer.js | 4 + lib/internal/buffer.js | 30 +++ node.gyp | 1 + src/node_buffer.cc | 55 ++---- src/node_i18n.cc | 280 ++++++++++++++++++++++++++++ src/util.h | 27 +++ test/parallel/test-icu-transcode.js | 48 +++++ tools/icu/icu-generic.gyp | 4 +- 9 files changed, 437 insertions(+), 39 deletions(-) create mode 100644 lib/internal/buffer.js create mode 100644 test/parallel/test-icu-transcode.js diff --git a/doc/api/buffer.md b/doc/api/buffer.md index 6d06ae9ddd21c9..3877cc55699077 100644 --- a/doc/api/buffer.md +++ b/doc/api/buffer.md @@ -2302,6 +2302,33 @@ added: v3.0.0 On 32-bit architectures, this value is `(2^30)-1` (~1GB). On 64-bit architectures, this value is `(2^31)-1` (~2GB). +## buffer.transcode(source, fromEnc, toEnc) + + +* `source` {Buffer} A `Buffer` instance +* `fromEnc` {String} The current encoding +* `toEnc` {String} To target encoding + +Re-encodes the given `Buffer` instance from one character encoding to another. +Returns a new `Buffer` instance. + +Throws if the `fromEnc` or `toEnc` specify invalid character encodings or if +conversion from `fromEnc` to `toEnc` is not permitted. + +The transcoding process will use substitution characters if a given byte +sequence cannot be adequately represented in the target encoding. For instance: + +```js +const newBuf = buffer.transcode(Buffer.from('€'), 'utf8', 'ascii'); +console.log(newBuf.toString('ascii')); + // prints '?' +``` + +Because the Euro (`€`) sign is not representable in US-ASCII, it is replaced +with `?` in the transcoded `Buffer`. + ## Class: SlowBuffer