-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: parsing non RFC uuid values #455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's certainly a good thing to cover more corner cases around the way v3/5 UUID namespaces can be passed.
💯 agree. Codifying this behavior in tests suggests that the ability to accept non-compliant UUIDs is a desirable and intentional trait when, in fact, it's not. The only reason this works is because I was lazy in my initial implementation, for which I apologize. Proposal:
I believe this would address most of the concerns we've seen around parsing behavior. BTW, we might want to consider enforcing the valid-ness of UUIDs using a well-crafted regex that we also export publicly. There's significant demand and value in such a regex. While it won't have the performance of @awwit 's previous solution, it will have the benefit of being consistent both internally and externally. For example, it could be used validate form fields such that any values there could be expected to pass whatever validate() function we provide. Thoughts? P.S. Note that the somewhat obscure Nil UUID complicates validation logic. I frequently forget that zero is a valid UUID version. 😞 |
As a side note, deno exposes such validation methods as well: https://github.com/denoland/deno/blob/master/std/uuid/v5.ts#L13 |
|
Yeah, they only give credit in the main module file |
@broofa I agree with all of your points. It would be a nice opportunity to finally attack these longstanding ideas around parsing and probably make this module even more "complete". Exposing the new APIs in a tree-shakeable way should also minimize impact for existing users. And I agree that we should move these developments into a new branch and let them mature for a while before we release yet another major version. |
@ctavan @broofa I made the changes that you asked me for. But I can’t create BTW, one more question that I wanted to ask a long time. https://github.com/uuidjs/uuid/blob/eb6038dda7/src/v1.js#L17 |
BTW, new way to parse uuid values shows this performance:
vs master:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work in general! I think we should still put some thoughts into the exact method signatures and return values, but all in all it goes into a very good direction already!
src/regex.js
Outdated
@@ -0,0 +1,3 @@ | |||
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [1-5]
should really be [1345]
since the RFC does not specify v2
UUIDs.
src/v35.js
Outdated
|
||
function uuidToBytes(uuid) { | ||
// Note: We assume we're being passed a valid uuid string | ||
const bytes = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be able to pre-allocate an Array(16)
here, shouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctavan I tested the performance and did not see the difference.
Well, let it be preallocated )
|
||
for (let i = 0; i < 4; ++i) { | ||
const byte = num & 0xff; | ||
bytes[offset + 3 - i] = byte; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line could warrant a comment. Something like fill the 4 appended bytes right-to-left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a link to the respective stackoverflow question 😉
src/v35.js
Outdated
bytes.push(parseInt(hex, 16)); | ||
}); | ||
if (!validate(uuid)) { | ||
return bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pre-allocate the bytes = Array(16)
then we should return null
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctavan for optimizations from the JS engine, it's best to always return the same type for each function.
Better to return an empty array…
src/version.js
Outdated
return parseInt(uuid.substr(14, 1), 16); | ||
} | ||
|
||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm still a bit unsure whether -1 is the best return value for an invalid UUID… @broofa wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctavan I would recommend always returning the same type from a function…
added comments to numberToBytes func preallocated array in uuidToBytes func
src/regex.js
Outdated
@@ -0,0 +1,3 @@ | |||
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctavan: Version 2 UUIDs are allowed. The RFC doesn't go into detail about them because they're spec'ed in Appendix A (pg. 586) of X/Open DCE: Remote Procedure Call, which is the original spec that heavily inspired/influenced the RFC. FWIW v2
UUIDs are nearly identical to v1
, except the node
field must be an IEEE [MAC] address, with no allowances for being set randomly. (I'm just now digging into this, btw. I've been happily ignoring this facet of the RFC for a decade. 🥳 )
Suggestions:
- Upper case for constant.
- Different regex. Use non-capturing group, group has to be around everything between
^
and$
to properly match begin and end of line.
const uuidRegex = /^([0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12})|((00000000-0000-0000-0000-000000000000))$/i; | |
const REGEX = /^(?:[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}|00000000-0000-0000-0000-000000000000)$/i; |
|
||
function uuidToBytes(uuid) { | ||
// Note: We assume we're being passed a valid uuid string | ||
const bytes = []; | ||
if (!validate(uuid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler, more compact, more performant(???) implementation:
// Char offset to hex pairs in uuid strings
const HEX_PAIRS=[0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34];
function uuidToBytes(uuid) {
if (!validate(uuid)) throw TypeError('Invalid UUID');
return HEX_PAIRS.map(i => parseInt(uuid.slice(i, i + 2), 16));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@broofa I checked this code performance:
uuidv3() x 198,096 ops/sec ±0.86% (94 runs sampled)
uuidv5() x 200,775 ops/sec ±0.55% (95 runs sampled)
This code is shorter, but less productive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would again favor conciseness over performance in this case.
.eslintrc.json
Outdated
@@ -13,6 +13,7 @@ | |||
}, | |||
"parser": "babel-eslint", | |||
"rules": { | |||
"no-var": ["error"] | |||
"no-var": ["error"], | |||
"curly": ["error", "all"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Revert? @ctavan Any opinion here? Since we're using prettier
, maybe avoid custom rules that are just about esthetics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re prettier
: Enforcing curly braces changes the AST and is therefore not in the scope of prettier
.
Re this setting: I was actually hoping that the "standard"
eslint config would give us a reasonable set of default rules like this one but apparently that's not the case. I personally like https://github.com/airbnb/javascript/tree/master/packages/eslint-config-airbnb but should we decide to include it we should do that in a different PR, that will go too far. It was probably a bad move from my end to propose adding this change to this PR in the first place, apologies! 🙇♂️
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
Co-authored-by: Robert Kieffer <robert@broofa.com>
.eslintrc.json
Outdated
@@ -13,6 +13,7 @@ | |||
}, | |||
"parser": "babel-eslint", | |||
"rules": { | |||
"no-var": ["error"] | |||
"no-var": ["error"], | |||
"curly": ["error", "all"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re prettier
: Enforcing curly braces changes the AST and is therefore not in the scope of prettier
.
Re this setting: I was actually hoping that the "standard"
eslint config would give us a reasonable set of default rules like this one but apparently that's not the case. I personally like https://github.com/airbnb/javascript/tree/master/packages/eslint-config-airbnb but should we decide to include it we should do that in a different PR, that will go too far. It was probably a bad move from my end to propose adding this change to this PR in the first place, apologies! 🙇♂️
|
||
function uuidToBytes(uuid) { | ||
// Note: We assume we're being passed a valid uuid string | ||
const bytes = []; | ||
if (!validate(uuid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would again favor conciseness over performance in this case.
@ctavan @broofa I edited the babel configuration a bit. Specified the version of IE for which the module will be built. https://babeljs.io/docs/en/babel-preset-env esmBrowser: {
presets: [['@babel/preset-env', { targets: { ie: '11' }, modules: false }]],
}, Or is it superfluous? |
.babelrc.js
Outdated
@@ -6,7 +6,7 @@ module.exports = { | |||
presets: [['@babel/preset-env', { targets: { node: '8' }, modules: 'commonjs' }]], | |||
}, | |||
esmBrowser: { | |||
presets: [['@babel/preset-env', { modules: false }]], | |||
presets: [['@babel/preset-env', { targets: { ie: '11' }, modules: false }]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked again and I think this is not necessary:
Sidenote, if no targets are specified, @babel/preset-env will transform all ECMAScript 2015+ code by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctavan is explicit better than implicit? =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your effort and especially for your patience with me 😄
I'll merge this later tonight
This reverts commit d096cc2.
f30b4a7
to
2218267
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. I've got some thoughts on uuidToBytes()
, but I'll put those up in separate PR once this is on a v9
branch.
Also, echoing what @ctavan said about your effort and patience. Thank you!
Hello!
I added tests for parsing non RFC uuid values. So that next time not to violate the current behavior of the module. (With future
v35
optimizations.)