Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed typo in test vectors, clarified proquint and identity, added links to specs. #85

Closed
wants to merge 6 commits into from

Conversation

sg495
Copy link
Contributor

@sg495 sg495 commented Oct 20, 2021

Closes #76, closes #80, closes #83, closes #84.

I have edited the multibase proquint RFC to clarify how the full prefix works, and I have added an explicit reference to the RFC in the table (because the proquint data encoded by multibase is different from the data encoded by the original proquint spec).

I have also added an explicit reference to the RFC for base8 in the table, because base8 according to the multibase RFC is significantly different from the base8 provided by other standard implementations (e.g. Python).
@sg495 sg495 changed the title Fix incorrect test bytestring in case_insensitivity.csv Fixed typo in test vectors, clarified proquint and identity, added links to specs. Oct 20, 2021
The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30.


## Encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with invalid utf8, unfortunately.

Really, 0x00 (or NUL) isn't a true multibase, that's why it's so hard to describe. It exists to distinghish between text and binary in a context where 0x0 means "everything following is binary".

Really, the right way to do this would be to use a raw/utf8/ascii/etc. multicodec to specify the encoding of the "following" data. E.g.:

  • 0x55 ... - binary.
  • 0xXX ... - utf8 where XX is some marker byte. Unfortunately, the BOM (0xFEFF) isn't a valid varint, so we'd probably need to pick another.

All this is saying... I'd rather just drop it entirely. I don't think we're actually using it anywhere (for real, at least).

Thoughts @vmx?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to remove the identity Multibase in case there are no objections. I think in the realm of Multibase it doesn't really makes sense as it is really about string encoding things.

Though things are sadly a bit complicated. Technically the identity Multibase is used in every DAG-CBOR encoded CID. The reason is that CID is specified with the Multibased prefix being part of the actual CID: https://github.com/multiformats/cid/tree/97ff4a329f04b70c1ab7255c62af48192146b025#how-does-it-work

Though talking with other folks, hardly anyone I know thinks of CIDs that way. I (and the people I've talked to) think of CIDs without the Multibase prefix and there there is of course Multibase prefixed CIDs.

To conclude, I'd remove the the identity Multibase and update the CID spec to reflect how CIDs are thought of today.

Added the Python module `multiformats` to the list of implementations.
@sg495 sg495 marked this pull request as draft July 21, 2022 13:28
@sg495 sg495 closed this Jul 21, 2022
@bumblefudge
Copy link
Contributor

@vmx @Stebalien Was the rest of this PR other than the rfcs/identity.md valid, tho? It seemed to fix other typos that might still need fixing. Happy to cherry pick the rest of the PR and re-open it if so!

@ben221199 ben221199 mentioned this pull request Aug 12, 2023
@ben221199
Copy link
Contributor

@bumblefudge I created a new pull request that fixes some of these typos. The identity is excluded from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants