Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core:encoding/uuid #3792

Merged
merged 23 commits into from
Jun 28, 2024
Merged

Add core:encoding/uuid #3792

merged 23 commits into from
Jun 28, 2024

Conversation

Feoramund
Copy link
Contributor

This is the odin-uuid package I wrote a while back, updated to use the new context-based random generator, along with some minor documentation touch-ups. It implements RFC 4122 compliant unique identifiers with generators for versions 3, 4, and 5.

@Kelimion
Copy link
Member

Kelimion commented Jun 21, 2024

Incidentally, I think we can relatively trivially extend the supported UUID versions to also include v1 and v6, which per RFC 9562 allow replacing the MAC address with random bytes for privacy. v6 is v1 with the time bits reversed.

So a generate_v1 :: proc(mac: Maybe([6]u8)) -> Identifier? On platforms that have it, the caller can then use net.enumerate_interfaces to get a real MAC if that interests them, without having to complicate the UUID code.

Not that the PR should be held up for those. They can be added after merger.

@Feoramund
Copy link
Contributor Author

One upside of making the Identifer be only a distinct type is that I was able to make the namespaces @(rodata) now, which is nice.

I'll have to give this RFC a read. I wasn't aware of it before.

@Kelimion
Copy link
Member

One upside of making the Identifer be only a distinct type is that I was able to make the namespaces @(rodata) now, which is nice.

Indeed!

I'll have to give this RFC a read. I wasn't aware of it before.

I'm not surprised. It was published only last month.

@Kelimion
Copy link
Member

Merge it, or do you want to tinker with it some more in light of the RFC update?

@Feoramund
Copy link
Contributor Author

Merge it, or do you want to tinker with it some more in light of the RFC update?

I'm working on v6 at the moment, then I'll add another test or two and clean up. Might get it all wrapped up in the next few hours, depending. I may implement v8 just as a version/variant stamp on top of a user-provided u128/array slice for good measure.

If you could check over the v1 proc for me, that'd be great. The endianness had me confused for a bit, because I know they're supposed to be stored in big-endian, but the time fields are flipped and also separated, so I just resorted to doing it byte-by-byte for sanity's sake.

@Kelimion
Copy link
Member

If you could check over the v1 proc for me, that'd be great. The endianness had me confused for a bit, because I know they're supposed to be stored in big-endian, but the time fields are flipped and also separated, so I just resorted to doing it byte-by-byte for sanity's sake.

Sure thing. I'll give it a thorough look tomorrow.

Copy link
Collaborator

@laytan laytan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a to_string where you pass your own byte buffer (because the length is constant callers can use it instead of allocating or messing around with the writers themselves).

Also imo version 3 and 5 should either be removed or be called something like generate_insecure_v3 because they use insecure hashing algorithms that should be avoided.

I am also not sure about putting this in encoding, because I don't think it is an encoding (maybe it is, I wouldn't look for it under encoding though)

core/encoding/uuid/generation.odin Show resolved Hide resolved
core/encoding/uuid/generation.odin Outdated Show resolved Hide resolved
@Feoramund
Copy link
Contributor Author

There's been some good suggestions brought up. I should be able to handle them today.

@Kelimion
Copy link
Member

Kelimion commented Jun 22, 2024

There's been some good suggestions brought up. I should be able to handle them today.

The consensus we came to within the team was to adopt the same strategy as core:crypto, and to add a core:encoding/uuid/legacy for the variants using MD5 and SHA1, themselves from core:crypto/legacy.

Then there's no need for a compile-time warning of any sort. We'd just have a large banner at the top of the file, same as in md5.odin.

e.g.

package encoding_uuid_legacy

import "core:encoding/uuid"
import "core:crypto/legacy/md5"

Identifier :: uuid.Identifier
(etc)

generate_v3_bytes :: proc(
    namespace: Identifier,
    name: []byte,
) -> (
    result: Identifier,
)

The v4 UUID doesn't need to go under legacy, but as @laytan observed, it probably should use runtime.random_generator_query_info and verify the random generator currently on context is .Cryptographic.

@Feoramund
Copy link
Contributor Author

If we're restricting usage to cryptographic generators, I think that should be fine so long as I include an example of how to do that, since the new context rand API is still quite fresh.

I read in the newer RFC that they suggest you could make your own v8 with an up to date hashing algorithm as an alternative to v3 and v5. We could provide one of those too, with the stipulation that v8 is the "make up your own" ID and that ours isn't an official implementation. generate_v8_sha2 or such. I'd need to take a look at the crypto API again. I forget if it's possible to specify the algorithm by enum but last I looked, I think it was. If that's the case, generate_v8_hash could work, with an enum argument.

uuid/legacy strikes me as better than generate_insecure_v3, too.

@Feoramund
Copy link
Contributor Author

@Kelimion I believe I've handled all the ideas brought up so far, other than where to put the package. I found myself accidentally typing core:uuid sometimes. Is that a better place for it?

@Kelimion
Copy link
Member

Kelimion commented Jun 22, 2024

@Kelimion I believe I've handled all the ideas brought up so far, other than where to put the package. I found myself accidentally typing core:uuid sometimes. Is that a better place for it?

I feel like that would set a bad precedent. The package is too narrowly focused for top-level billing. The top-level packages are either categories, or fundamental. I think of uuid as more of a core:encoding/uuid thing than something fundamental. No offence. :-)

Without a limiting principle like that we'd end up with 121 top-level packages within weeks, and that's just no way to organize anything.

core/encoding/uuid/reading.odin Outdated Show resolved Hide resolved
core/encoding/uuid/reading.odin Outdated Show resolved Hide resolved
core/encoding/uuid/reading.odin Outdated Show resolved Hide resolved
tests/core/encoding/uuid/test_core_uuid.odin Outdated Show resolved Hide resolved
@Feoramund
Copy link
Contributor Author

That ought to do it.

@Kelimion
Copy link
Member

That ought to do it.

The thing is that it's _nsec because it's "opaque" and it allows changing how it's represented in the future. Unlikely for that to happen at this point, but that's to deter users of the core library from poking at it directly. If we merge this PR, then the UUID package is part of the core library packages - not a downstream user - that can do so legitimately.

@Feoramund
Copy link
Contributor Author

Unlikely for that to happen at this point, but that's to deter users of the core library from poking at it directly. If we merge this PR, then the UUID package is part of the core library packages - not a downstream user - that can do so legitimately.

That all makes sense to me. I tend to err on the side of caution with these sort of things in the aim of not introducing more of a burden to future maintenance.

@Kelimion
Copy link
Member

Kelimion commented Jun 22, 2024

Unlikely for that to happen at this point, but that's to deter users of the core library from poking at it directly. If we merge this PR, then the UUID package is part of the core library packages - not a downstream user - that can do so legitimately.

That all makes sense to me. I tend to err on the side of caution with these sort of things in the aim of not introducing more of a burden to future maintenance.

We can introduce an even lighter time.from_nanoseconds :: proc(nsec: i64) -> Time that's a barely disguised cast that'll inline to the same for people to use for "outside" epoch conversions and Time construction that'll be safe to use in the event of Time updates, but I'd consider that outside of the scope of this PR.

@Kelimion
Copy link
Member

Kelimion commented Jun 22, 2024

I've just added time.from_nanoseconds, so if the internal representation ever changes, that constructor will still hold up.

@Feoramund
Copy link
Contributor Author

Not a bad idea at all. I'd have to rebase to get access to it, which will overwrite all the past commits. I think GitHub correctly tracks the differences in force-push rebasing, but I'm not 100% sure. I can do that here and now in this PR if you'd like.

@Kelimion
Copy link
Member

Not a bad idea at all. I'd have to rebase to get access to it, which will overwrite all the past commits. I think GitHub correctly tracks the differences in force-push rebasing, but I'm not 100% sure. I can do that here and now in this PR if you'd like.

It should do, especially because it's a simple re-graft on top of the same branch and you're introducing an entirely new package. Up to you. Can also update to use the new call after merge if you want.

Waiting for Bill to give the new package his blessing and merge it. It has mine.

@Feoramund
Copy link
Contributor Author

There we go. Looks like the PR survived intact.

@gingerBill gingerBill merged commit 4824050 into odin-lang:master Jun 28, 2024
6 checks passed
@Feoramund Feoramund deleted the core-uuid branch June 30, 2024 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants