var bintag = require('bintag');
A Node.js module for creating buffers using tagged template strings.
This module uses the tagged template string syntax to fill buffers with structured binary data. This is primarily useful in unit tests for code that deals with binary data formats or network protocols.
The simplest example:
bintag`i4: 1 2 -10 0xaabbccdd`
// = <Buffer 01 00 00 00 02 00 00 00 f6 ff ff ff dd cc bb aa>
The expression evaluates to a new Buffer
instance.
The i4:
is a format specifier which means that the numbers following it
should be formatted as 4-byte integers.
Note that template strings can be multi-line:
bintag`i1:
0x12 0x34
0x56 0x78`
// = <Buffer 12 34 56 78>
- Substitutions
- Format specifiers
- Endianness
- Shortcut tags
- Groups
- Repeat count
- Alignment
- Padding
- Offset expressions
- Length expressions
- Base for offset calculations
- Compiled templates
The template string syntax allows substitutions: ${
expression}
. With
bintag
, you can use substitutions as values to be formatted as well as in
other syntactic constructs.
A substitution expression can evaluate to a single value:
let n = 10;
bintag`i4: ${n}`
// = <Buffer 0a 00 00 00>
It can evaluate to an array of values:
let array = [1, 2, 3];
bintag`i1: ${array}`
// = <Buffer 01 02 03>
Nested arrays will be flattened:
let array = [1, [2], 3, [[4]], [5, 6], 7];
bintag`i1: ${array}`
// = <Buffer 01 02 03 04 05 06 07>
Anything
iterable,
such as a Set
or a generator, will be treated the same way as an array:
function* gen(){
yield 1;
yield 2;
yield [3, 4];
}
bintag`i1: ${gen()}`
// = <Buffer 01 02 03 04>
Unless the iterable is a generator, it must produce stable results on two subsequent traversals, or the result will be undefined.
A substitution expression can also evaluate to a Buffer
:
let buf = bintag`i1: 0xaa 0xbb`;
bintag`${buf} i2: 2 ${buf}`
// = <Buffer aa bb 02 00 aa bb>
Arrays and nested arrays of buffers are also supported. You can even mix buffers with immediate values in the same array. An empty array produces no buffer content.
Finally, a substitution expression can evaluate to a compiled bintag
template
(see below).
A substitution is generally allowed wherever the syntax expects an integer, such as in a format specifier:
let n=2;
bintag`i${n}: 1`
// = <Buffer 01 00>
When used in this way, the substitution expression must evaluate to an integer,
or to something that can be converted to an integer. (1.6
will happily become
1
, but {}
becomes NaN
and will be rejected.)
A substitution cannot be used in place of parts of the syntax that are not numbers, such as the letter of a format specifier.
All format specifiers end with a colon. Whitespace after the colon is optional.
A format specifier itself does not produce any buffer content, but it specifies the format in which the values following it are to be formatted. The format remains active until another format specifier is encountered:
bintag`i1: 1 2 i4: 7 i1: 8`
// = <Buffer 01 02 07 00 00 00 08>
Note: format specifiers are case-sensitive.
The integer format specifier is the letter i
followed by a number between 1
and 6. Each formatted integer will take the specified number of bytes in the
buffer.
This format specifier accepts decimal integers with an optional sign, as well as unsigned hexadecimal nubmers:
bintag`i1: 1 +1 -1 0x80 128 -128`
// = <Buffer 01 01 ff 80 80 80>
Both positive and negative values can be used to represent certain bit
patterns, such as 80
in the example above.
A value that is out of range for both signed and unsigned integers of the specified width, will trigger an exception.
For i2
and upwards, endianness matters. See below.
The format specifier is just x
. It expects an even number of hexadecimal
digits:
bintag`x: 12 34 abCDef`
// = <Buffer 12 34 ab cd ef>
Whitespace between pairs is optional. The only case when this whitespace matters is when a repeat count (see below) is used: it will apply to a whole “word” of hexadecimal data.
The x
format specifier handles substitution expressions like i1
.
This format is not affected by endianness.
The format specifiers are f
for “float” (32-bit) and d
for “double”
(64-bit) types.
bintag`f: -1.1 d: .5e-10`
// = <Buffer cd cc 8c bf bb bd d7 d9 df 7c cb 3d>
The standard JS syntax for floating-point literals is supported, including
Infinity
with an optional sign, and NaN
. Note that -0
produces binary
data distinct from 0
.
The format is affected by endianness.
The three string formats are a
for ASCII, u
for UTF-8, and U
for UTF-16.
When a string is forced to ASCII, the lower byte of each character's Unicode
value will be used.
With string formats, substitution expressions must be used. There is no syntax for string values in the template string.
When a bare string format specifier is used, the result will take exactly the number of bytes that are necessary to represent all the characters of a string:
bintag`a: ${'abc'}`
// = <Buffer 61 62 63>
If the format letter is followed by an integer constant (or a substitution expression evaluating to an integer), the string will take exactly the specified number of bytes, and will be truncated or zero-padded as necessary.
bintag`a4: ${['ab', 'xyzzy']}`
// = <Buffer 61 62 00 00 78 79 7a 7a>
When a Unicode string is truncated, an incomplete character at the end is never encoded. If necessary, the string will be zero-padded:
bintag`u8: ${'\u1000\u1000\u1000'}`
// = <Buffer e1 80 80 e1 80 80 00 00>
The z
modifier adds a terminating zero byte (two bytes in case of UTF-16).
bintag`az: ${'abc'}`
// = <Buffer 61 62 63 00>
If z
is combined with a fixed length, the length includes the terminator, and
the string is guaranteed to be zero-terminated. This means that it might be
truncated earlier than without z
to accomodate the terminator.
The p
modifier, followed by an integer between 1 and 4 (or a substitution
expression evaluating to such an integer), makes the string a “Pascal string”:
length followed by string data. The number specifies the width of the length
field.
bintag`ap2: ${'abc'}`
// = <Buffer 03 00 61 62 63>
For UTF-8, the number of bytes is stored in the length field rather than the number of Unicode characters. For UTF-16, the number of two-byte pairs is stored, which can be different from the number of Unicode characters when surrogate pairs are present.
The length field respects endianness. Strings longer than the maximum length than can be represented in a length field of the chosen size (such as 255 characters for a 1-byte length), will be truncated.
If the p
modifier is combined with a fixed length, the latter includes the
size of the length field. Therefore, the fixed length must be greater than the
size of the length field.
bintag`a8p1: ${['abc', '0123456789']}`
// = <Buffer 03 61 62 63 00 00 00 00 07 30 31 32 33 34 35 36>
The UTF-16 encoding is affected by endianness, but ASCII and UTF-8 are not.
By default, data is formatted according to the endianness of the host platform.
This can be overridden by the endianness specifiers: LE:
for little-endian
and BE:
for big-endian. An endianness specifier remains in effect until
overridden by another such specifier.
bintag`i2: LE: 0xabcd BE: 0x1122 0xabcd LE: 0x1122`
// = <Buffer cd ab 11 22 ab cd 22 11>
Note: endianness specifiers are case-sensitive.
You can call bintag.tag
to create a shortcut to a particular set of options.
The shortcut can be used in tagged template expressions in place of bintag
:
let short = bintag.tag('i2:');
short`1 2`
// = <Buffer 01 00 02 00>
let utf16le = bintag.tag('LE:U:');
utf16le`${'abc'}`
// = <Buffer 61 00 62 00 63 00>
The use of a shortcut is equivalent to specifying the options at the start of the template.
The following convenience shortcuts are already defined in the bintag
module:
bintag.LE
for LE:
, bintag.BE
for BE:
, and bintag.hex
for x:
.
bintag.BE`i2: 1`
// = <Buffer 00 01>
Here is another convenient way to use the predefined shortcuts:
let hex = require('bintag').hex;
hex`aa bb`
// = <Buffer aa bb>
A parenthesized group allows you to override format and endianness, and the original settings will be restored after the group ends:
bintag`i1: 1 2 (i2: 3 4) 5 6`
// = <Buffer 01 02 03 00 04 00 05 06>
Groups can be nested. At the start of a group, settings are inherited from the surrounding context.
An nonnegative integer followed by an asterisk specifies a repeat count for the immediately following value or parenthesized group:
bintag`x: 2*(4*aa 2*1234)`
// = <Buffer aa aa aa aa 12 34 12 34 aa aa aa aa 12 34 12 34>
The repeat count can be given by a substitution expression:
let n = 6, x = 8;
bintag`i1: ${n}*${x}`
// = <Buffer 08 08 08 08 08 08>
Use !
followed by a positive integer to pad the data with zero bytes up to a
multiple of a number:
bintag`x: aa bb cc !4 dd !2 ee !16`
// = <Buffer aa bb cc 00 dd 00 ee 00 00 00 00 00 00 00 00 00>
A substitution expression can be used instead of an integer constant to specify the alignment.
Without a number, !
aligns to the width determined by the current format. For
integer and floating-point formats, the format's size is used. For UTF-16
strings, the alignment is 2 bytes. For all other formats, bare !
has no
effect because the alignment is 1 byte.
bintag`i4: 1 ! 2 (x: aa bb) ! 3 4`
// = <Buffer 01 00 00 00 02 00 00 00 aa bb 00 00 03 00 00 00 04 00 00 00>
Note: see below for information about what offsets are relative to.
The =
character followed by a nonnegative integer pads the data with zero
bytes until the offset from the beginning of the buffer becomes equal to the
number.
bintag`x: aa bb =4 cc dd`
// = <Buffer aa bb 00 00 cc dd>
An attempt to rewind before the current position will trigger an exception.
A substitution expression can be used instead of an integer constant to specify the offset.
Note: see below for information about what offsets are relative to.
The @
character followed by a nonnegative integer computes the offset at
which a parenthesized group in the current template ends up in the output
buffer. @1
refers to the group whose opening bracket is the leftmost, @2
to
the second leftmost one, and so on. @0
refers to the whole template (and
therefore evaluates to 0). Such references can occur before, within, and after
the groups they refer to.
bintag`x: cc (i2: 1 @2) (i2: 2 @1)`
// = <Buffer cc 01 00 05 00 02 00 01 00>
The offset will be encoded according to the current format specifier.
A substitution expression cannot be used in place of the integer after the
@
character.
Note: see below for information about what offsets are relative to.
The #
character followed by a nonnegative integer computes the length a
parenthesized group in the current template occupies in the output buffer. #0
refers to the whole template.
bintag`i2: (1 #1) #0`
// = <Buffer 01 00 04 00 06 00>
If the group referred to has a repeat count, the size of the content is taken before the repeat count is applied.
The length will be encoded according to the current format specifier.
When #
is instead followed by a substitution expression, the size of the data
is computed; however, the data itself is not placed into the output buffer.
This is useful to compute the size of a buffer or a list of buffers:
let buf = bintag`x: aabb ccdd`;
bintag`i2: #${buf}`
// = <Buffer 04 00>
Finally, #
can be followed by a parenthesized group. In this case, the size
that group would take in the output buffer is computed; however, the group
itself does not produce output.
bintag`i2: #(az: ${'abc'})`
// = <Buffer 04 00>
Normally, offsets for purposes of alignment, padding, and offset expressions, are relative to the start of the output buffer. However, within a parenthesized group with a repeat count, even if the repeat count is 1, offsets are instead relative to the start of the (innermost such) group.
bintag`x: 00 2*(aa =4 bb p2 cc)`
// = <Buffer 00 aa 00 00 00 bb 00 cc aa 00 00 00 bb 00 cc>
The same applies within parenthesized groups preceded by #
.
bintag`i1: 1 #(x: ddee i4: p)`
// = <Buffer 01 04>
You can get a “compiled template” object by using bintag.compile
in a tagged template
expression:
let t = bintag.compile`x: aa bb`;
This also works for shortcuts:
let short = bintag.tag('i2:');
let t = short.compile`1 2`;
A “compiled template” has the following API:
-
create()
method: creates and returns a newBuffer
with the binary data described by the template. This can be called repeatedly to obtain new buffers without re-parsing the template. -
length
property (read-only): the length of the data described by the template. Can be used to do some sort of allocation or negotiation in advance. -
write(buf, [offset])
method: writes the data into an existing buffer at the specified offset (defaults to offset 0). The buffer must be large enough to contain the data. A non-zero offset specified here does not affect the offset calculations within the template (that is, the start of the template is still considered to have offset 0). The method returns the number of bytes written.
In addition, a compiled template has a numbered property for each parenthesized
group in the template. These are objects with offset
and length
read-only
properties that return the offset and length of each parenthesized group.
let t = bintag.compile`i4: 0 (x: aa bb)`
t[1].offset
// = 4
t[1].length
// = 2
Note that e.g. t[1].offset
has the same value as @1
in the template, and
t[1].length
has the same value as #1
in the template.
Compiled templates and arrays of compiled templates can be used within other templates, in a manner similar to buffers:
let t = bintag.compile`x: aa bb`;
bintag`2*${t}`
// = <Buffer aa bb aa bb>
Arrays, buffers and other objects referred to by substitution expressions, must not be modified between the compilation and any use of a template.