Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

Open
Vap0r1ze opened this issue Jul 30, 2024 · 4 comments
Open

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

Vap0r1ze opened this issue Jul 30, 2024 · 4 comments

Comments

@Vap0r1ze
Copy link

The base protocol (NIP-01) draft currently says this:

The following characters in the content field must be escaped as shown, and all other characters must be included verbatim:

  • A line break (0x0A), use \n
  • A double quote (0x22), use \"
  • A backslash (0x5C), use \\
  • A carriage return (0x0D), use \r
  • A tab character (0x09), use \t
  • A backspace, (0x08), use \b
  • A form feed, (0x0C), use \f

It says "all other characters must be included verbatim", but the JSON standard (see Section 9 "String") requires that "the control characters U+0000 to U+001F" are escaped using \uXXXX unicode escapes.

An example of a "content" value that is valid in NIP-01 but invalid in JSON:

JSON.parse(`"\u0000"`)

At this point it's probably not feasible to change the draft to use valid JSON, but the draft should probably mention that you must deviate from the JSON standard to produce NIP-01 compliant event IDs.

@mikedilger
Copy link
Contributor

As I recall the intent was that those characters are invalid nostr characters, so we don't need encodings for them.

@fiatjaf
Copy link
Member

fiatjaf commented Aug 4, 2024

Unicode escape codes are an aberration from a distant past that should be forgotten.

As long as you're not doing anything super weird this problem won't happen and most default JSON encoders will do the right thing.

@Vap0r1ze
Copy link
Author

Vap0r1ze commented Aug 5, 2024

After looking into this, there's more than just 0x00-0x1F that this "problem" exists for. That section of NIP-01 is essentially trying to restate the ECMAScript spec's QuoteJSONString (how JSON.stringify handles strings), to try an ensure determinism. There are two more ranges that QuoteJSONString uses \uXXXX escapes for, but those doesn't matter much since they only exist to cope with how JavaScript strings don't need to be valid in any encoding.

I think to prevent headache for someone who decides to implement their own JSON (de)serializer, NIP-01 could:

  • Specify these restrictions on all arbitrary strings rather than just event.content (like those inside event.tags)
  • Either:
    1. Require that the decoded strings are valid UTF-8 and disallow control codes
    2. Refer to ECMAScript's QuoteJSONString for deterministic string serialization.

As much as I would like the ability to send raw control codes, given that terminals are very much not "a distant past". I do think that option 1 is more ideal so that the string values are ensured to be valid utf-8, making compliant parsing easy for both JSON.parse users (no encoding required) and serde_json users (must be valid UTF-8 since it uses std::string::String)

@fiatjaf
Copy link
Member

fiatjaf commented Aug 5, 2024

Very good points. I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants