clarify that utf-8 is just a possible encoding of strings #684

andimarek · 2020-02-06T18:59:01Z

this change tries to clarify that String scalars are not always UTF-8 strings, but actually sequences of unicode code points, which could be UTF-8, but doesn't have to.

eapache · 2020-02-06T19:28:16Z

spec/Section 3 -- Type System.md

 human-readable text. All response formats must support string representations,
 and that representation must be used here.

 **Result Coercion**

-Fields returning the type {String} expect to encounter UTF-8 string internal values.


I think this part should continue to specify UTF-8? As I read it, it's about the serialization of strings in responses, where specifying an encoding is actually appropriate?

This section is about result coercion not serialization. Of course String can be serialized to UTF-8 (and often they are via UTF-8 JSON) but it doesn't have to be.

IvanGoncharov · 2020-02-07T03:04:38Z

@andimarek Maybe I'm missing something but the discussion was about extending the range of possible code points not removing UTF8 from the spec?
Internally we can use whatever encoding we want it just that if we send it to GraphQL server or receive it from GraphQL server it should UTF-8.
How clients can figure out what encoding they should use for strings?

If you need to send string in some other encoding you can always create custom scalar for that and with specifyBy your clients can figure out what encoding to use.

andimarek · 2020-02-07T04:00:18Z

@IvanGoncharov this is just a cleanup/correction. As discussed today the current section mentioning UTF-8 is just wrong: UTF-8 is one of the possible Unicode encodings. Strings are sequences of unicode code points, not UTF-8 Strings. In fact the reference implementation itself uses UTF-16 to represent Strings (because JS uses UTF-16 internally to encode Unicode).

Also: sending data over the wire (serialization) is different from Scalar Coercion. The most commonly used serialization format is JSON which again is normally always encoded in UTF-8. We have an extra section how to serialize to JSON. But this is in noway required: JSON UTF-8 encoded serialization is just an option.

leebyron · 2021-04-12T18:22:07Z

@andimarek I made some edits, let me know if these look good to you

leebyron · 2021-04-16T20:51:48Z

I'm going to merge this now since this is the other half of the change made in #854

andimarek added 2 commits February 7, 2020 05:53

clarify that utf-8 is just a possible encoding of strings

7870341

fix Unicode spelling

fea6940

eapache reviewed Feb 6, 2020

View reviewed changes

IvanGoncharov added the 🤷‍♀️ Ambiguity An issue/PR which identifies or fixes spec ambiguity label May 30, 2020

IvanGoncharov requested a review from leebyron May 30, 2020 16:25

IvanGoncharov approved these changes May 30, 2020

View reviewed changes

Base automatically changed from master to main February 3, 2021 04:50

leebyron added this to the May2021 milestone Apr 6, 2021

leebyron added 3 commits April 11, 2021 22:05

Merge branch 'main' into cleanup-string-scalar

919a9b3

Update Section 3 -- Type System.md

6b5a63f

Update Section 3 -- Type System.md

78ffd10

leebyron approved these changes Apr 12, 2021

View reviewed changes

leebyron added the ✏️ Editorial PR is non-normative or does not influence implementation label Apr 16, 2021

leebyron merged commit 61c50f2 into graphql:main Apr 16, 2021

jangko mentioned this pull request Apr 27, 2021

unicode support improvements status-im/nim-graphql#50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

clarify that utf-8 is just a possible encoding of strings #684

clarify that utf-8 is just a possible encoding of strings #684

Uh oh!

andimarek commented Feb 6, 2020

Uh oh!

eapache Feb 6, 2020

Uh oh!

andimarek Feb 6, 2020

Uh oh!

IvanGoncharov commented Feb 7, 2020

Uh oh!

andimarek commented Feb 7, 2020 •

edited

Loading

Uh oh!

leebyron commented Apr 12, 2021

Uh oh!

leebyron commented Apr 16, 2021

Uh oh!

Uh oh!

clarify that utf-8 is just a possible encoding of strings #684

clarify that utf-8 is just a possible encoding of strings #684

Uh oh!

Conversation

andimarek commented Feb 6, 2020

Uh oh!

eapache Feb 6, 2020

Choose a reason for hiding this comment

Uh oh!

andimarek Feb 6, 2020

Choose a reason for hiding this comment

Uh oh!

IvanGoncharov commented Feb 7, 2020

Uh oh!

andimarek commented Feb 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leebyron commented Apr 12, 2021

Uh oh!

leebyron commented Apr 16, 2021

Uh oh!

Uh oh!

andimarek commented Feb 7, 2020 •

edited

Loading