Skip to content

Conversation

f3ath
Copy link
Collaborator

@f3ath f3ath commented Jan 26, 2025

Generated normalized paths as suggested in #100 (comment)

@f3ath f3ath requested a review from gregsdennis January 26, 2025 00:27
@f3ath f3ath mentioned this pull request Jan 26, 2025
@f3ath
Copy link
Collaborator Author

f3ath commented Jan 26, 2025

Updated after the review #100 (comment)

Copy link
Collaborator

@gregsdennis gregsdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as on the other PR: my implementation is having trouble with escaping chars in name selectors. Otherwise passes.

"A"
],
"result_paths": [
"$['\\\\/']"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the path be "$['/']" here? I think solidus (/, 0x2F) is "normal-unescaped".

Same for "single quotes, escaped solidus".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems to be a bug in my implementation. The document does not have backslashes, so why would they appear in the normalized path.

Thanks for catching all these things.

Copy link
Collaborator Author

@f3ath f3ath Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought though, according to 2.3.1.1, the / is escapable

   name-selector       = string-literal

   string-literal      = %x22 *double-quoted %x22 /     ; "string"
                         %x27 *single-quoted %x27       ; 'string'

   double-quoted       = unescaped /
                         %x27      /                    ; '
                         ESC %x22  /                    ; \"
                         ESC escapable

   single-quoted       = unescaped /
                         %x22      /                    ; "
                         ESC %x27  /                    ; \'
                         ESC escapable

   ESC                 = %x5C                           ; \ backslash

   unescaped           = %x20-21 /                      ; see RFC 8259
                            ; omit 0x22 "
                         %x23-26 /
                            ; omit 0x27 '
                         %x28-5B /
                            ; omit 0x5C \
                         %x5D-D7FF /
                            ; skip surrogate code points
                         %xE000-10FFFF

   escapable           = %x62 / ; b BS backspace U+0008
                         %x66 / ; f FF form feed U+000C
                         %x6E / ; n LF line feed U+000A
                         %x72 / ; r CR carriage return U+000D
                         %x74 / ; t HT horizontal tab U+0009
                         "/"  / ; / slash (solidus) U+002F
                         "\"  / ; \ backslash (reverse solidus) U+005C
                         (%x75 hexchar) ;  uXXXX U+XXXX

I'll need to take a deeper look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought though, according to 2.3.1.1, the / is escapable

True, but 2.7 says

Certain characters are escaped in Normalized Paths in one and only one way; all other characters are unescaped.

And the grammar puts / in the unescaped category.

normal-single-quoted = normal-unescaped /
                       ESC normal-escapable
normal-unescaped     =    ; omit %x0-1F control codes
                       %x20-26 /
                          ; omit 0x27 '
                       %x28-5B /
                          ; omit 0x5C \
                       %x5D-D7FF /
                          ; skip surrogate code points
                       %xE000-10FFFF

normal-escapable     = %x62 / ; b BS backspace U+0008
                       %x66 / ; f FF form feed U+000C
                       %x6E / ; n LF line feed U+000A
                       %x72 / ; r CR carriage return U+000D
                       %x74 / ; t HT horizontal tab U+0009
                       "'" /  ; ' apostrophe U+0027
                       "\" /  ; \ backslash (reverse solidus) U+005C
                       (%x75 normal-hexchar)
                                       ; certain values u00xx U+00XX

Copy link
Collaborator Author

@f3ath f3ath Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that $['/'] is a correct normalized path (as per 2.7) but not a valid JSONPath query (as per 2.3.1.1)? @glyn @gregsdennis

But 2.7 also says that

a Normalized Path is a JSONPath query

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it's worth clarifying in the RFC repo: ietf-wg-jsonpath/draft-ietf-jsonpath-base#526

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is indeed a bug in my implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that $['/'] is a correct normalized path (as per 2.7) but not a valid JSONPath query (as per 2.3.1.1)? @glyn @gregsdennis

(I think this question has been answered by ietf-wg-jsonpath/draft-ietf-jsonpath-base#526.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I incorrectly interpreted "escapable" as "must be escaped".

@jg-rp
Copy link
Contributor

jg-rp commented Jan 27, 2025

Comparing cts.json from this PR to one built from #100, #100 adds three test cases:

  • basic, name shorthand, object data, nested
  • name selector, name, double quotes, nested
  • name selector, name, double quotes, contains single quote

And differs on the expected normalized path for:

  • name selector, double quotes, escaped solidus
  • name selector, single quotes, escaped solidus

Otherwise every test case matches perfectly.

@jg-rp
Copy link
Contributor

jg-rp commented Jan 30, 2025

I have rerun my cts.json comparison script. This PR now matches #100, with the exception of the three additional test cases mentioned above and the updates to cts.schema.json.

@gregsdennis
Copy link
Collaborator

@jg-rp @f3ath which PR are we going for, this or #100?

@glyn
Copy link
Contributor

glyn commented Feb 3, 2025 via email

@jg-rp
Copy link
Contributor

jg-rp commented Feb 3, 2025

@jg-rp @f3ath which PR are we going for, this or #100?

Either works for me. 👍

@glyn
Copy link
Contributor

glyn commented Feb 3, 2025

This PR doesn't have the correct schema changes. PR #100 has the correct schema changes, but is missing some JSON and so the schema validation check fails.

@jg-rp
Copy link
Contributor

jg-rp commented Feb 3, 2025

This PR doesn't have the correct schema changes. PR #100 has the correct schema changes, but is missing some JSON and so the schema validation check fails.

Indeed. Excluding an updated cts.json from #100 is deliberate, as it gets built by CI. I’d be happy to include it if you prefer.

I get the impression that #101 is intended to aid comparison and is not to be merged, at least not in its current form.

@glyn
Copy link
Contributor

glyn commented Feb 3, 2025

This PR doesn't have the correct schema changes. PR #100 has the correct schema changes, but is missing some JSON and so the schema validation check fails.

Indeed. Excluding an updated cts.json from #100 is deliberate, as it gets built by CI. I’d be happy to include it if you prefer.

Yes, please. It should get overwritten with the same content by CI, but at least the schema validation should pass. (Aside: strange that it validates cts.json before overwriting it rather than after.)

I get the impression that #101 is intended to aid comparison and is not to be merged, at least not in its current form.

Ok, let's close 101 to be crystal clear what's going on.

@glyn glyn closed this Feb 3, 2025
@f3ath
Copy link
Collaborator Author

f3ath commented Feb 3, 2025

I get the impression that #101 is intended to aid comparison and is not to be merged, at least not in its current form.

Correct. This PR was to help cross check the paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants