NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

Vap0r1ze · 2024-07-30T19:28:45Z

The base protocol (NIP-01) draft currently says this:

The following characters in the content field must be escaped as shown, and all other characters must be included verbatim:

A line break (0x0A), use \n

A double quote (0x22), use \"

A backslash (0x5C), use \\

A carriage return (0x0D), use \r

A tab character (0x09), use \t

A backspace, (0x08), use \b

A form feed, (0x0C), use \f

It says "all other characters must be included verbatim", but the JSON standard (see Section 9 "String") requires that "the control characters U+0000 to U+001F" are escaped using \uXXXX unicode escapes.

An example of a "content" value that is valid in NIP-01 but invalid in JSON:

JSON.parse(`"\u0000"`)

At this point it's probably not feasible to change the draft to use valid JSON, but the draft should probably mention that you must deviate from the JSON standard to produce NIP-01 compliant event IDs.

The text was updated successfully, but these errors were encountered:

mikedilger · 2024-08-04T21:01:10Z

As I recall the intent was that those characters are invalid nostr characters, so we don't need encodings for them.

fiatjaf · 2024-08-04T21:06:19Z

Unicode escape codes are an aberration from a distant past that should be forgotten.

As long as you're not doing anything super weird this problem won't happen and most default JSON encoders will do the right thing.

Vap0r1ze · 2024-08-05T07:03:41Z

After looking into this, there's more than just 0x00-0x1F that this "problem" exists for. That section of NIP-01 is essentially trying to restate the ECMAScript spec's QuoteJSONString (how JSON.stringify handles strings), to try an ensure determinism. There are two more ranges that QuoteJSONString uses \uXXXX escapes for, but those doesn't matter much since they only exist to cope with how JavaScript strings don't need to be valid in any encoding.

I think to prevent headache for someone who decides to implement their own JSON (de)serializer, NIP-01 could:

Specify these restrictions on all arbitrary strings rather than just event.content (like those inside event.tags)
Either:
1. Require that the decoded strings are valid UTF-8 and disallow control codes
2. Refer to ECMAScript's QuoteJSONString for deterministic string serialization.

As much as I would like the ability to send raw control codes, given that terminals are very much not "a distant past". I do think that option 1 is more ideal so that the string values are ensured to be valid utf-8, making compliant parsing easy for both JSON.parse users (no encoding required) and serde_json users (must be valid UTF-8 since it uses std::string::String)

fiatjaf · 2024-08-05T13:00:40Z

Very good points. I agree.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

Vap0r1ze commented Jul 30, 2024

mikedilger commented Aug 4, 2024

fiatjaf commented Aug 4, 2024

Vap0r1ze commented Aug 5, 2024

fiatjaf commented Aug 5, 2024

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

NIP-01 suggests encoding "content" in a non-JSON compatible way #1403

Comments

Vap0r1ze commented Jul 30, 2024

mikedilger commented Aug 4, 2024

fiatjaf commented Aug 4, 2024

Vap0r1ze commented Aug 5, 2024

fiatjaf commented Aug 5, 2024