Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ALL Emoji #4142

Open
JonLev opened this issue Jan 2, 2020 · 16 comments
Open

Support for ALL Emoji #4142

JonLev opened this issue Jan 2, 2020 · 16 comments

Comments

@JonLev
Copy link

JonLev commented Jan 2, 2020

We face the same issue as #3404
When editing Emoji in ACE the cursor seems to be in the wrong location (too far right) making editing very hard (the issue happen per line).

There is an example with last version:
https://jsfiddle.net/9va6be3d/2/

If you go after the emoji, the gutter do not work correctly.

@farshiana
Copy link

farshiana commented Jan 21, 2020

Hi, I'm very fond of Ace editor but this is really blocking for us :/ Is there some kind of workaround?

@nightwing
Copy link
Member

Hi, currently ace supports only fixed width characters, i'll try to fix this in the next release.

@farshiana
Copy link

Thanks. I am looking forward to it!

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Mar 11, 2020

I work in a company where we use emojis a lot, in an online collaborative yaml editor I built on top of ace. The cursor shown on the wrong location after some emojis is causing us a lot of daily headaches 😵

I love ace, it's the perfect solution for us, and this bug is the only thing holding us back. Looking forward to this issue being solved ✊

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented May 13, 2020

@nightwing any news about this issue ? 🙏

Can you maybe point me to the part of the code where I could work a fix for myself ?

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Jun 5, 2020

It looks like the problem is related to this issue :

UTF-16 surrogate pairs largely unsupported #1153

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Jun 5, 2020

Ok, I have narrowed it down to these to cases :

  • Emojis made with two UTF‑16 characters are correctly handled : 🤦‍♀😃📢🔔
  • Emojis made with only one UTF-16 character are not handled correctly : ⌚❌➰⏳

Now, there was a pull request #2244 merged to manage emojis this january.

In the code added in this PR, lib/ace/selection.js has a condition offsetting the cursor when encountering a surrogate pair, which is not triggered by the emojis of one UTF-16 char because they have no surrogate.

As the emojis are larger even when made of only one UTF-16 char, the cursor appears to not be on the right spot.

EDIT:

  • I also encountered some emojis composed of 2 emojis (1 or 2 char each) with a zero-width joiner in between : 👨‍🎨

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Jun 5, 2020

I have worked this (dirty) workaround for myself, if anyone is interested : I reduce the size of single-char emojis to match the size of only one character https://github.com/JeSuisUnCaillou/ace/pull/2/files

And I've built it here : https://github.com/JeSuisUnCaillou/ace-builds/tree/fix/reduce_monochar_emoji_size

@JonLev
Copy link
Author

JonLev commented Jun 10, 2020

@nightwing would the fix of @JeSuisUnCaillou would be usable ?

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Jun 10, 2020

@JonLev The real solution should be to consider all emojis as a single character of width 2, whether they are one char or two chars with a surrogate. I also encountered some emojis composed of 2 emojis (1 or 2 char each) with a zero-width joiner in between, like this one, which should also be considered as one character.

My fix just tries to avoid the cursor offset (which makes the editor very hard to use), but I don't think it's a reasonable solution to the emoji problem.

I didn't dive deep enough to understand all the code needed to implement the complete solution.

@JeSuisUnCaillou
Copy link

JeSuisUnCaillou commented Jun 25, 2020

I'm discovering more corner cases regularly.

Today, I learned that an emoji can be followed by a character called VARIATION SELECTOR-16, which is just here to say that the previous character must be displayed as an emoji (for emojis that also have a "normal" display, like this one : 🕵)

One day, I will make an exhaustive list of all the weird cases of emojis. And maybe with time and iterations, I'll try to implement it properly, who knows ?

@kkucharc
Copy link

Hi! Any news about in this issue?

@tgross35
Copy link

tgross35 commented Jun 20, 2022

Adding in - this isn't just emojis, but also special characters such as .

The issue is likely relevant to all UTF8 characters >1 byte (can be 1-4 bytes).

Context for anyone looking (@JeSuisUnCaillou) you want to look into splitting the strings by UTF-8 graphemes rather than literal 8-bit chars. This gives characters as we know and see them, rather than a "char" as a computer sees it.

Rules for this follow unicode segmentation, found here https://unicode.org/reports/tr29/. There should be libraries to do this in JS.

I'm not a frontend dev, but for example, see the rust library for it https://docs.rs/unicode-segmentation/latest/unicode_segmentation/. Splitting by graphemes gives you what you expect - ["a̐", "é", "ö̲", "\r\n"] but running the same thing with a normal char split gives ['a', '\u{310}', 'e', '\u{301}', 'o', '\u{308}', '\u{332}', '\r', '\n']

@andrewnester
Copy link
Contributor

We have a tracking issue for this problem here: #460

@RomanShemelin
Copy link

Hi! Any news about in this issue?

@petersolopov
Copy link

We're facing a similar problem and are looking forward to a resolution. Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants