Skip to content

Rename symbols from ...utf8... to ...cesu8... #1268

Closed
@martijnthe

Description

@martijnthe

It wasn't obvious to me the resulting string for APIs like jerry_string_to_char_buffer() isn't actually UTF-8 encoded, but CESU-8. (FWIW, I understand and agree with the rationale behind using CESU-8 vs UTF-8 (as per the discussion in #616).)

It also seems like in the implementation itself, functions are sometimes labeled with ..._utf8_..., but in reality use CESU-8 (for example ecma_string_copy_to_utf8_buffer()).

I would like to suggest 3 things:

  1. Rename all symbols (functions names, types, etc) that "falsely advertise" that they are dealing with UTF-8. For people that are new to the project and don't know about the recent transition from using UTF-8 to CESU-8 internally, this is very confusing.
  2. Clarify in the docstrings with each API that can hand back/copy out a string to the client of JerryScript, is CESU-8 encoded, and for APIs that use strings as input, specify that they need to be CESU-encoded.
  3. Over time, add APIs/wrappers for converting between CESU-8 <=> UTF-8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions