Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: parsing non RFC uuid values #455

Merged
merged 13 commits into from
Jun 2, 2020
3 changes: 3 additions & 0 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ export { default as v1 } from './v1.js';
export { default as v3 } from './v3.js';
export { default as v4 } from './v4.js';
export { default as v5 } from './v5.js';
export { default as version } from './version.js';
broofa marked this conversation as resolved.
Show resolved Hide resolved
export { default as validate } from './validate.js';
broofa marked this conversation as resolved.
Show resolved Hide resolved
export { default as uuidRegex } from './regex.js';
3 changes: 3 additions & 0 deletions src/regex.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [1-5] should really be [1345] since the RFC does not specify v2 UUIDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan: Version 2 UUIDs are allowed. The RFC doesn't go into detail about them because they're spec'ed in Appendix A (pg. 586) of X/Open DCE: Remote Procedure Call, which is the original spec that heavily inspired/influenced the RFC. FWIW v2 UUIDs are nearly identical to v1, except the node field must be an IEEE [MAC] address, with no allowances for being set randomly. (I'm just now digging into this, btw. I've been happily ignoring this facet of the RFC for a decade. 🥳 )

Suggestions:

  • Upper case for constant.
  • Different regex. Use non-capturing group, group has to be around everything between ^ and $ to properly match begin and end of line.
Suggested change
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i;
const REGEX = /^(?:[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}|00000000-0000-0000-0000-000000000000)$/i;


export default uuidRegex;
awwit marked this conversation as resolved.
Show resolved Hide resolved
37 changes: 28 additions & 9 deletions src/v35.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,29 @@
import bytesToUuid from './bytesToUuid.js';
import validate from './validate.js';

// Int32 to 4 bytes
function numberToBytes(num, bytes) {
const offset = bytes.length;
bytes.push(0, 0, 0, 0);

for (let i = 0; i < 4; ++i) {
const byte = num & 0xff;
bytes[offset + 3 - i] = byte;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line could warrant a comment. Something like fill the 4 appended bytes right-to-left.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a link to the respective stackoverflow question 😉

num = (num - byte) / 256;
}
}

function uuidToBytes(uuid) {
// Note: We assume we're being passed a valid uuid string
const bytes = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be able to pre-allocate an Array(16) here, shouldn't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan I tested the performance and did not see the difference.
Well, let it be preallocated )


uuid.replace(/[a-fA-F0-9]{2}/g, function (hex) {
bytes.push(parseInt(hex, 16));
});
if (!validate(uuid)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler, more compact, more performant(???) implementation:

// Char offset to hex pairs in uuid strings
const HEX_PAIRS=[0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34];

function uuidToBytes(uuid) {
  if (!validate(uuid)) throw TypeError('Invalid UUID');
  return HEX_PAIRS.map(i => parseInt(uuid.slice(i, i + 2), 16));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@broofa I checked this code performance:

uuidv3() x 198,096 ops/sec ±0.86% (94 runs sampled)
uuidv5() x 200,775 ops/sec ±0.55% (95 runs sampled)

This code is shorter, but less productive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would again favor conciseness over performance in this case.

return bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pre-allocate the bytes = Array(16) then we should return null here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan for optimizations from the JS engine, it's best to always return the same type for each function.
Better to return an empty array…

}

numberToBytes(parseInt(uuid.slice(0, 8), 16), bytes);
numberToBytes(parseInt(uuid.slice(9, 13) + uuid.slice(14, 18), 16), bytes);
numberToBytes(parseInt(uuid.slice(19, 23) + uuid.slice(24, 28), 16), bytes);
numberToBytes(parseInt(uuid.slice(28), 16), bytes);
Copy link
Member

@broofa broofa May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing here removes the need for the checks in v35.js#generateUuid()

What about something like this...?

Suggested change
if (!validate(uuid)) {
return bytes;
}
numberToBytes(parseInt(uuid.slice(0, 8), 16), bytes);
numberToBytes(parseInt(uuid.slice(9, 13) + uuid.slice(14, 18), 16), bytes);
numberToBytes(parseInt(uuid.slice(19, 23) + uuid.slice(24, 28), 16), bytes);
numberToBytes(parseInt(uuid.slice(28), 16), bytes);
// Character offset to each hex pair in a UUID string
const HEX_PAIRS = [0,2,4,6,9,11,14,16,19,21,24,26,28,30,32,34];
function uuidToBytes(uuid){
if (!validate(uuid)) throw TypeError('Invalid UUID');
return HEX_PAIRS.map(i =>parseInt(uuid.slice(i, i+2), 16));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the perf numbers:

@awwit's implementation:

uuidv3() x 215,511 ops/sec ±1.49% (83 runs sampled)
uuidv5() x 217,387 ops/sec ±1.24% (81 runs sampled)

My suggested implementation:

uuidv3() x 177,188 ops/sec ±3.32% (89 runs sampled)
uuidv5() x 184,682 ops/sec ±1.44% (83 runs sampled)

Classic tradeoff between conciseness vs. performance, I guess? @ctavan thoughts?


return bytes;
}
Expand All @@ -28,8 +45,6 @@ export const URL = '6ba7b811-9dad-11d1-80b4-00c04fd430c8';

export default function (name, version, hashfunc) {
function generateUUID(value, namespace, buf, offset) {
const off = (buf && offset) || 0;

if (typeof value === 'string') value = stringToBytes(value);
if (typeof namespace === 'string') namespace = uuidToBytes(namespace);
awwit marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -47,12 +62,16 @@ export default function (name, version, hashfunc) {
bytes[8] = (bytes[8] & 0x3f) | 0x80;

if (buf) {
for (let idx = 0; idx < 16; ++idx) {
buf[off + idx] = bytes[idx];
offset = offset || 0;

for (let i = 0; i < 16; ++i) {
buf[offset + i] = bytes[i];
}

return buf;
}

return buf || bytesToUuid(bytes);
return bytesToUuid(bytes);
}

// Function#name is not settable on some platforms (#270)
Expand Down
7 changes: 7 additions & 0 deletions src/validate.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import uuidRegex from './regex.js';

function validate(uuid) {
return uuidRegex.test(uuid);
awwit marked this conversation as resolved.
Show resolved Hide resolved
}

export default validate;
11 changes: 11 additions & 0 deletions src/version.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import validate from './validate.js';

function version(uuid) {
if (validate(uuid)) {
return parseInt(uuid.substr(14, 1), 16);
}

return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm still a bit unsure whether -1 is the best return value for an invalid UUID… @broofa wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan I would recommend always returning the same type from a function…

awwit marked this conversation as resolved.
Show resolved Hide resolved
}

export default version;
42 changes: 42 additions & 0 deletions test/unit/v35.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,27 @@ describe('v5', () => {
assert.ok(v3('hello.example.com', '00000000-0000-0000-0000-000000000000'));
});

test('v3 parsing non RFC uuid values', () => {
assert.throws(() => {
v3(
'hello.example.com',
// equal '00000000-0000-0000-0000-000000000000'
'00000000000000000000000000000000',
ctavan marked this conversation as resolved.
Show resolved Hide resolved
);
});

// During parsing only two hex chars in a row are taken into account
// The remaining characters are ignored.
awwit marked this conversation as resolved.
Show resolved Hide resolved

assert.throws(() => {
v3(
'hello.example.com',
// equal '00000000-0000-0000-0000-000000000000'
awwit marked this conversation as resolved.
Show resolved Hide resolved
'=Y00a-f*v00b*-00c-00d#-p00f\b-00g-00h-####00i^^^-00j*1*2*3&-L00k-\n00l-/00m-----00n-fg000-00p-00r+',
);
});
});

test('v3 namespace buffer validation', () => {
assert.throws(() => {
v3('hello.example.com', new Array(15));
Expand Down Expand Up @@ -204,6 +225,27 @@ describe('v5', () => {
assert.ok(v5('hello.example.com', '00000000-0000-0000-0000-000000000000'));
});

test('v5 parsing non RFC uuid values', () => {
awwit marked this conversation as resolved.
Show resolved Hide resolved
assert.throws(() => {
v5(
'hello.example.com',
// equal '00000000-0000-0000-0000-000000000000'
awwit marked this conversation as resolved.
Show resolved Hide resolved
'00000000000000000000000000000000',
);
});

// During parsing only two hex chars in a row are taken into account
// The remaining characters are ignored.
awwit marked this conversation as resolved.
Show resolved Hide resolved

assert.throws(() => {
v5(
'hello.example.com',
// equal '00000000-0000-0000-0000-000000000000'
awwit marked this conversation as resolved.
Show resolved Hide resolved
'=Y00a-f*v00b*-00c-00d#-p00f\b-00g-00h-####00i^^^-00j*1*2*3&-L00k-\n00l-/00m-----00n-fg000-00p-00r+',
);
});
});

test('v5 namespace buffer validation', () => {
assert.throws(() => {
v5('hello.example.com', new Array(15));
Expand Down
18 changes: 18 additions & 0 deletions test/unit/version.test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import assert from 'assert';
import version from '../../src/version.js';

describe('version', () => {
test('check uuid version', () => {
assert.strictEqual(version('00000000-0000-0000-0000-000000000000'), 0);

assert.strictEqual(version('d9428888-122b-11e1-b85c-61cd3cbb3210'), 1);

assert.strictEqual(version('109156be-c4fb-41ea-b1b4-efe1671c5836'), 4);

assert.strictEqual(version('a981a0c2-68b1-35dc-bcfc-296e52ab01ec'), 3);

assert.strictEqual(version('90123e1c-7512-523e-bb28-76fab9f2f73d'), 5);

assert.strictEqual(version('invalid uuid string'), -1);
awwit marked this conversation as resolved.
Show resolved Hide resolved
});
});