Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support binary strings, preserve UTF-8 and UTF-16 errors #2314

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Commits on Jul 21, 2023

  1. Configuration menu
    Copy the full SHA
    067e682 View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2023

  1. Binary strings: preserve UTF-8 and UTF-16 errors

    The internal string representation is changed from UTF-8 with replacement
    characters to a modified form of "WTF-8" that is able to distinctly encode
    UTF-8 errors and UTF-16 errors.
    
    This handles UTF-8 errors in raw string inputs and handles UTF-8 and UTF-16
    errors in JSON input. UTF-16 errors (using "\uXXXX") and UTF-8 errors (using
    the original raw bytes) are maintained when emitting JSON. When emitting raw
    strings, UTF-8 errors are maintained and UTF-16 errors are converted into
    replacement characters.
    Maxdamantus committed Jul 22, 2023
    Configuration menu
    Copy the full SHA
    6aff473 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    79f0479 View commit details
    Browse the repository at this point in the history
  3. Correct UTF-8 and UTF-16 errors during concatenation

    UTF-8 errors and UTF-16 errors that were previously encoded into the ends of
    strings will now potentially be used to form correct code points.
    
    This is mostly a matter of making string equality behave expectedly, since
    without this normalisation, it is possible to produce `jv` strings that are
    converted to UTF-8 or UTF-16 the same way but are not equal due well-formed
    code units that may or may not be encoded as errors.
    Maxdamantus committed Jul 22, 2023
    Configuration menu
    Copy the full SHA
    7fde46e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2e1b5d2 View commit details
    Browse the repository at this point in the history
  5. Preserve UTF-8 and UTF-16 errors in explode

    Errors are emitted as negative code points instead of being transformed into
    replacement characters. `implode` is also updated accordingly so the original
    string can be reconstructed without data loss.
    Maxdamantus committed Jul 22, 2023
    Configuration menu
    Copy the full SHA
    f68f25b View commit details
    Browse the repository at this point in the history
  6. Remove UTF-8 backtracking workaround

    This is no longer needed as strings are capable of storing partial UTF-8
    sequences.
    Maxdamantus committed Jul 22, 2023
    Configuration menu
    Copy the full SHA
    5c2fe32 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    911d01a View commit details
    Browse the repository at this point in the history