Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: parsing non RFC uuid values #455

Merged
merged 13 commits into from
Jun 2, 2020
Merged

Conversation

awwit
Copy link
Contributor

@awwit awwit commented May 25, 2020

Hello!

I added tests for parsing non RFC uuid values. So that next time not to violate the current behavior of the module. (With future v35 optimizations.)

Copy link
Member

@ctavan ctavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's certainly a good thing to cover more corner cases around the way v3/5 UUID namespaces can be passed.

test/unit/v35.test.js Outdated Show resolved Hide resolved
@awwit awwit requested a review from ctavan May 27, 2020 10:53
test/unit/v35.test.js Outdated Show resolved Hide resolved
@broofa
Copy link
Member

broofa commented May 27, 2020

The thing about the test cases that you are adding here is that they actually reveal unintended behavior

💯 agree. Codifying this behavior in tests suggests that the ability to accept non-compliant UUIDs is a desirable and intentional trait when, in fact, it's not. The only reason this works is because I was lazy in my initial implementation, for which I apologize.

Proposal:

  • Fix uuidToBytes so it only accepts valid UUID strings (have it return null otherwise?)
  • Invert logic of tests added here to validate they don't pass
  • Add a validate(str) function to public API (just returns uuidToBytes(str) !== null)
  • Add a version(string) function to public API (returns UUID version number)
  • Put these changes on a "v9" branch (for future version 9 release) since stricter parse behavior may break existing v3/v5 use cases. But hold off on releasing that branch for a while. These changes aren't urgent enough to push immediately and I suspect our users may be suffering from a bit of major-release fatigue lately.

I believe this would address most of the concerns we've seen around parsing behavior.

BTW, we might want to consider enforcing the valid-ness of UUIDs using a well-crafted regex that we also export publicly. There's significant demand and value in such a regex. While it won't have the performance of @awwit 's previous solution, it will have the benefit of being consistent both internally and externally. For example, it could be used validate form fields such that any values there could be expected to pass whatever validate() function we provide.

Thoughts?

P.S. Note that the somewhat obscure Nil UUID complicates validation logic. I frequently forget that zero is a valid UUID version. 😞

@ctavan
Copy link
Member

ctavan commented May 27, 2020

As a side note, deno exposes such validation methods as well: https://github.com/denoland/deno/blob/master/std/uuid/v5.ts#L13

@broofa
Copy link
Member

broofa commented May 27, 2020

As a side note, deno exposes such validation methods as well

Interesting copyright 🤣

@ctavan
Copy link
Member

ctavan commented May 27, 2020

Interesting copyright 🤣

Yeah, they only give credit in the main module file

@ctavan
Copy link
Member

ctavan commented May 27, 2020

@broofa I agree with all of your points. It would be a nice opportunity to finally attack these longstanding ideas around parsing and probably make this module even more "complete".

Exposing the new APIs in a tree-shakeable way should also minimize impact for existing users.

And I agree that we should move these developments into a new branch and let them mature for a while before we release yet another major version.

@awwit
Copy link
Contributor Author

awwit commented May 27, 2020

@ctavan @broofa I made the changes that you asked me for.

But I can’t create v9 branch in your repository to change PR target.

BTW, one more question that I wanted to ask a long time.
Can I change your API to use only Uint8Array (buf argument) in this places?

https://github.com/uuidjs/uuid/blob/eb6038dda7/src/v1.js#L17
https://github.com/uuidjs/uuid/blob/eb6038dda7/src/v4.js#L4
https://github.com/uuidjs/uuid/blob/eb6038dda7/src/v35.js#L30

@awwit awwit requested a review from ctavan May 27, 2020 18:15
@awwit
Copy link
Contributor Author

awwit commented May 27, 2020

BTW, new way to parse uuid values shows this performance:

uuidv3() x 238,241 ops/sec ±0.88% (92 runs sampled)
uuidv5() x 236,577 ops/sec ±0.71% (95 runs sampled)

vs master:

uuidv3() x 153,787 ops/sec ±0.53% (91 runs sampled)
uuidv5() x 153,972 ops/sec ±0.40% (90 runs sampled)

Copy link
Member

@ctavan ctavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work in general! I think we should still put some thoughts into the exact method signatures and return values, but all in all it goes into a very good direction already!

src/regex.js Outdated
@@ -0,0 +1,3 @@
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [1-5] should really be [1345] since the RFC does not specify v2 UUIDs.

src/v35.js Outdated

function uuidToBytes(uuid) {
// Note: We assume we're being passed a valid uuid string
const bytes = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be able to pre-allocate an Array(16) here, shouldn't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan I tested the performance and did not see the difference.
Well, let it be preallocated )

test/unit/v35.test.js Outdated Show resolved Hide resolved
test/unit/v35.test.js Outdated Show resolved Hide resolved
test/unit/v35.test.js Outdated Show resolved Hide resolved
test/unit/v35.test.js Outdated Show resolved Hide resolved

for (let i = 0; i < 4; ++i) {
const byte = num & 0xff;
bytes[offset + 3 - i] = byte;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line could warrant a comment. Something like fill the 4 appended bytes right-to-left.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a link to the respective stackoverflow question 😉

src/v35.js Outdated
bytes.push(parseInt(hex, 16));
});
if (!validate(uuid)) {
return bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pre-allocate the bytes = Array(16) then we should return null here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan for optimizations from the JS engine, it's best to always return the same type for each function.
Better to return an empty array…

src/v35.js Outdated Show resolved Hide resolved
src/version.js Outdated
return parseInt(uuid.substr(14, 1), 16);
}

return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm still a bit unsure whether -1 is the best return value for an invalid UUID… @broofa wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan I would recommend always returning the same type from a function…

added comments to numberToBytes func
preallocated array in uuidToBytes func
src/regex.js Outdated Show resolved Hide resolved
src/regex.js Outdated
@@ -0,0 +1,3 @@
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan: Version 2 UUIDs are allowed. The RFC doesn't go into detail about them because they're spec'ed in Appendix A (pg. 586) of X/Open DCE: Remote Procedure Call, which is the original spec that heavily inspired/influenced the RFC. FWIW v2 UUIDs are nearly identical to v1, except the node field must be an IEEE [MAC] address, with no allowances for being set randomly. (I'm just now digging into this, btw. I've been happily ignoring this facet of the RFC for a decade. 🥳 )

Suggestions:

  • Upper case for constant.
  • Different regex. Use non-capturing group, group has to be around everything between ^ and $ to properly match begin and end of line.
Suggested change
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i;
const REGEX = /^(?:[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}|00000000-0000-0000-0000-000000000000)$/i;

src/validate.js Outdated Show resolved Hide resolved
src/version.js Outdated Show resolved Hide resolved
test/unit/version.test.js Outdated Show resolved Hide resolved
test/unit/v35.test.js Outdated Show resolved Hide resolved

function uuidToBytes(uuid) {
// Note: We assume we're being passed a valid uuid string
const bytes = [];
if (!validate(uuid)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler, more compact, more performant(???) implementation:

// Char offset to hex pairs in uuid strings
const HEX_PAIRS=[0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34];

function uuidToBytes(uuid) {
  if (!validate(uuid)) throw TypeError('Invalid UUID');
  return HEX_PAIRS.map(i => parseInt(uuid.slice(i, i + 2), 16));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@broofa I checked this code performance:

uuidv3() x 198,096 ops/sec ±0.86% (94 runs sampled)
uuidv5() x 200,775 ops/sec ±0.55% (95 runs sampled)

This code is shorter, but less productive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would again favor conciseness over performance in this case.

.eslintrc.json Outdated
@@ -13,6 +13,7 @@
},
"parser": "babel-eslint",
"rules": {
"no-var": ["error"]
"no-var": ["error"],
"curly": ["error", "all"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Revert? @ctavan Any opinion here? Since we're using prettier, maybe avoid custom rules that are just about esthetics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re prettier: Enforcing curly braces changes the AST and is therefore not in the scope of prettier.

Re this setting: I was actually hoping that the "standard" eslint config would give us a reasonable set of default rules like this one but apparently that's not the case. I personally like https://github.com/airbnb/javascript/tree/master/packages/eslint-config-airbnb but should we decide to include it we should do that in a different PR, that will go too far. It was probably a bad move from my end to propose adding this change to this PR in the first place, apologies! 🙇‍♂️

src/v35.js Show resolved Hide resolved
test/unit/v35.test.js Outdated Show resolved Hide resolved
awwit and others added 4 commits May 28, 2020 02:42
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
src/regex.js Outdated Show resolved Hide resolved
.eslintrc.json Outdated
@@ -13,6 +13,7 @@
},
"parser": "babel-eslint",
"rules": {
"no-var": ["error"]
"no-var": ["error"],
"curly": ["error", "all"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re prettier: Enforcing curly braces changes the AST and is therefore not in the scope of prettier.

Re this setting: I was actually hoping that the "standard" eslint config would give us a reasonable set of default rules like this one but apparently that's not the case. I personally like https://github.com/airbnb/javascript/tree/master/packages/eslint-config-airbnb but should we decide to include it we should do that in a different PR, that will go too far. It was probably a bad move from my end to propose adding this change to this PR in the first place, apologies! 🙇‍♂️

src/index.js Show resolved Hide resolved
src/index.js Show resolved Hide resolved
src/index.js Show resolved Hide resolved
src/regex.js Show resolved Hide resolved

function uuidToBytes(uuid) {
// Note: We assume we're being passed a valid uuid string
const bytes = [];
if (!validate(uuid)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would again favor conciseness over performance in this case.

test/unit/validate.test.js Show resolved Hide resolved
test/unit/validate.test.js Show resolved Hide resolved
@awwit
Copy link
Contributor Author

awwit commented May 29, 2020

@ctavan @broofa I edited the babel configuration a bit. Specified the version of IE for which the module will be built.

https://babeljs.io/docs/en/babel-preset-env

esmBrowser: {
  presets: [['@babel/preset-env', { targets: { ie: '11' }, modules: false }]],
},

Or is it superfluous?

@awwit awwit requested review from broofa and ctavan May 29, 2020 16:19
@ctavan ctavan mentioned this pull request May 29, 2020
.babelrc.js Outdated
@@ -6,7 +6,7 @@ module.exports = {
presets: [['@babel/preset-env', { targets: { node: '8' }, modules: 'commonjs' }]],
},
esmBrowser: {
presets: [['@babel/preset-env', { modules: false }]],
presets: [['@babel/preset-env', { targets: { ie: '11' }, modules: false }]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked again and I think this is not necessary:

Sidenote, if no targets are specified, @babel/preset-env will transform all ECMAScript 2015+ code by default.

Copy link
Contributor Author

@awwit awwit May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan is explicit better than implicit? =)

Copy link
Member

@ctavan ctavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your effort and especially for your patience with me 😄

I'll merge this later tonight

@ctavan ctavan force-pushed the test/parse-non-rfc-values branch from f30b4a7 to 2218267 Compare May 30, 2020 12:33
Copy link
Member

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. I've got some thoughts on uuidToBytes(), but I'll put those up in separate PR once this is on a v9 branch.

Also, echoing what @ctavan said about your effort and patience. Thank you!

@ctavan ctavan mentioned this pull request Jun 2, 2020
@broofa broofa changed the base branch from master to v9 June 2, 2020 15:34
@broofa broofa merged commit ddb7ee9 into uuidjs:v9 Jun 2, 2020
@broofa broofa mentioned this pull request Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants