Skip to content

UTF-8 with emojis detected as pure ascii with 100% confidence #161

Open
@piranna

Description

@piranna

I think here there are two bugs:

  1. a pure ascii string (0x00-0x7F) is also a valid UTF-8 string, so it should detect both of them, if not with a 100% confidence maybe a 99% for the UTF-8 case to give priority to the ascii one
  2. if text has emojis or any code sequence outside of the ones of pure ascii, definitely it's NOT a pure ascii string

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions