-
Notifications
You must be signed in to change notification settings - Fork 418
MSC2545: Image Packs (Emoticons & Stickers) #2545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: old_master
Are you sure you want to change the base?
Conversation
|
People wanting to try out an implementation of custom emotes on the web can use their existing homeserver account through this riot-web instance. |
Fluffychat Android works too :) |
turt2live
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems to be in a sensible direction, though the duplicated effort for custom emoji is a bit bothersome :(
|
What about adding a mandatory level of pack indirection ?
and then use what is proposed here inside This allows bundling and updating packs independently, and overcome the 65k limit by quite a margin. |
Not quite sure what you mean, here. That, for room emotes, you can specify other state keys that just extend that one pack? While not explicitly stated in this MSC (yet?), if multiple packs have the same |
|
Ok nevermind, I misread. I thought |
|
how hard would it be to add versioning and uuids to emoji packs |
You already have versioning with state events - or do you mean something like That being said, what if multiple packs have the same uuid but different emotes? Merge them? What if you spread one pack over multiple state events to overcome the 65k chars limit and then give them all the same uuid? Should they appear as one pack? How should that work with emote_rooms? |
if emotes are added to a pack, then the version number should be changed, and yeah i meant like a 1.0 thing, if multiple packs have the same uuid and the same version number, but different emotes, then the conflict should be reported to the user, after thinking about it might make more sense to have a last modified date instead of a version number, and it just uses the newest one, because if you want to remove an emote from a pack, then it shouldn't add it back from an older version of the pack, and i don't see how any of this is affected by it being in a room or a user account |
Would a user care about the version of an emote pack? How to determine which version is newer? Grammer for version numbers would need to be defined, etc. |
|
Like, maybe the uuid and versioning stuff could be added in a separate msc building onto this - the goal of this was to keep things as simple as possible |
ok if one pack was spread over multiple events, i would have them share the uuid and also have part numbers, like "part":1 "part":2 etc for A, mainting the order of emotes, and B, to prevent them from overlapping eachother.
state events have timestamps right, so you could just use the newest state event instead of bothering with version numbers |
origin_server_ts can't be trusted and can easily be forged
What would it need part for? It shouldn't matter if they are ordered correctly or not, emotes are an unordered set |
oh you could just put a modified tag in then with like unixtime or something
the most important reason for the part value is to distinguish continuations of the pack from updates and being able to order the emotes could matter to some pack maintainers, i would want to be able to do that for example |
In a federated system there is no one true timesource. That is a mathematically unsolvable problem.
Updates would replace the previous state event. You don't need any "part" attribute for that. As for ordering, you could say they are ordered alphanumerically by the state key. |
|
MSCs proposed for Final Comment Period (FCP) should meet the requirements outlined in the checklist prior to being accepted into the spec. This checklist is a bit long, but aims to reduce the number of follow-on MSCs after a feature lands. SCT members: please check off things you check for, and raise a concern against FCP if the checklist is incomplete. If an item doesn't apply, prefer to check it rather than remove it. Unchecking items is encouraged where applicable. Checklist:
@mscbot concern checklist incomplete |
Mainly so the discussion threads do not get lost.
turt2live
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this. I haven't read the MSC in detail, but the parts I did read look more than sufficient for starting FCP in my opinion.
| 1. User image pack (defined in the user's account data) | ||
| 2. Image pack rooms (defined in the `m.image_pack.rooms` user account data | ||
| object) | ||
| 3. Space image packs (defined in the hierarchy of canonical spaces for the | ||
| current room) | ||
| 4. Room image packs (defined in the currently open room's state) | ||
| 5. Referenced room image packs (defined in the `m.image_pack.rooms` room | ||
| state event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the room to take priority over the space since it is more "local" to the current context.
proposals/2545-emotes.md
Outdated
|
|
||
| ### Shortcode grammar | ||
|
|
||
| A shortcode's length MUST not exceed 100 bytes. This restriction MUST be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be valid Unicode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've done some research to try and figure this out. To clarify, there's a difference between "valid UTF-8 byte sequences" and "valid Unicode scalar values".
Not every byte sequence can be decoded as valid UTF-8. Therefore, if we allowed any sequence of bytes (as the proposal currently does), text decoders may panic upon attempting to blindly decode them. For this reason, we absolutely should restrict the grammar to only "byte sequences that can be decoded as UTF-8".
Separately, the set of "valid Unicode scalar values" is a range that has been stabilised since 2003 (RFC 3629) and is unlikely to change. We could restrict the grammar to this range to avoid causing any Unicode parsers to fail. This seems reasonable to me.
The Unicode consortium assign new codepoints every year in each Unicode update. It would be foolish to restrict to only the currently valid set of assigned codepoints, as that would break forwards-compatibility.
Given the above, the grammar rules could be:
- The value must be a well-formed UTF-8 byte sequence;
- After decoding, every code point must be a Unicode scalar (i.e. not a surrogate or noncharacter);
- Length ≤ 100 bytes;
- Must not contain
:(U+003A) or(U+0020).
What do people think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Valid UTF-8" is somewhat implied by the fact that it has to be transferred over a JSON REST API as part of an event (though it's certainly the case that the spec could do better on this: it's one of the many things discussed in matrix-org/matrix-spec#365). Anyway, I wouldn't bother calling it out for this instance: clients can and do reasonably expect that they can utf-decode all event content.
Ditto the surrogates, and (fwiw) characters above U+10FFFF.
You mention "noncharacters": are you talking about things like U+FDD0? We don't special-case those elsewhere afaik, so I wouldn't do so here.
More to the point, though, I question why we are specifying this in bytes rather than unicode codepoints. All the clients and servers I know of work in unicode codepoints at this level; to determine if a shortcode is longer than 100 bytes they would have to re-encode those strings back into utf-8 and see what the result is. (Admittedly I haven't actually looked at server impls other than Synapse, so maybe this is bogus logic, but remember that to even get as far as inspecting a shortcode you've had to deserialize the JSON into some sort of higher-level object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we don't even need a length limit? suppose somebody makes a shortcode that is 62000 bytes long. What have they achieved, other than filling their entire image pack with a single shortcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually -- are we limited by what's allowed in an event/message already here? Otherwise you might have a mostly correct message with an invalid short-code?
| This avoids a split-brain in the room. Servers MAY opt to locally redact events | ||
| having too many bytes in the shortcode field. | ||
|
|
||
| The `:` character MUST NOT be included in the emote shortcode. The `:` character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also formally exclude U+0000 (NUL) characters from shortcodes, as some parsers can treat that character as "end of string".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it also exclude EOL characters in that vein? I kind of doubt anyone will actually use those since multiline short-codes would be strange, but I don't know if we prefer to be explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just NUL seems to match https://spec.matrix.org/latest/appendices/#room-ids. I fear we might find a lot of characters in unicode that might warrant some kind of exception 🤔
richvdh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excites to see progress on this!
proposals/2545-emotes.md
Outdated
| A shortcode's length MUST not exceed 100 bytes. This restriction MUST be | ||
| enforced by servers when sending reactions, but servers MUST NOT reject events | ||
| coming across federation due to having too many bytes in the shortcode field. | ||
| This avoids a split-brain in the room. Servers MAY opt to locally redact events | ||
| having too many bytes in the shortcode field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised that we're requiring servers to enforce this grammar rather than clients. Typically, we've steered clear of requiring servers to do much in the way of validation on event bodies.
We're probably going to have to answer the question "what should clients do when they see a shortcode that doesn't match the grammar" anyway (since homeservers may be old, and anyway they "MAY opt to locally redact events" rather than MUST).
Was any thought given to putting the requirement for enforcing grammar on the client rather than the server?
proposals/2545-emotes.md
Outdated
| Specifically, the `U+0020` character (normal space) is disallowed. There are | ||
| multiple other byte sequences associated with spaces in Unicode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple other byte sequences associated with spaces in Unicode.
... which are allowed, or not? This is unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In context it is poorly worded. The above paragraph specifies the intention.
In addition, the space character is not allowed. This helps avoid common usability paper-cuts (multiple spaces between words, spaces at the beginning/end of a shortcode). Shortcodes containing multiple words are encouraged to use alterantive separators like hyphens or underscores.
This paragraph is below the intention, and is meant to clarify what exactly is not allowed. Considering the intent (arbitrary spacing between words). It might be good to avoid negation, which is typically bad for readability. Instead of saying what isn't allowed, say what is. This also addresses concerns regarding the null character all in one. I recommend the following:
| Specifically, the `U+0020` character (normal space) is disallowed. There are | |
| multiple other byte sequences associated with spaces in Unicode. | |
| In addition, only non-separator graphic unicode characters are allowed. This helps avoid several problems: | |
| 1. **Typos and Usability Issues:** Prevents problems caused by varying whitespace between words or at the beginning/end of a shortcode. Shortcodes containing multiple words are encouraged to use alternatives (hyphens, underscores, casing, etc.), to create a visual separation between words. | |
| 2. **Null Character:** Disallows shortcodes that contain the null character (`U+0000`), which may be interpreted as a string terminator by many processors. | |
| 3. **Control Characters & Security:** Disallows malicious shortcodes that contain control characters such as backspace (`U+0008`), delete (`U+007F`), or carriage return (`U+000D`). These characters serve no legitimate purpose in this context and may even be exploited to engineer security vulnerabilities. |
|
|
||
| ##### `alt` attribute | ||
|
|
||
| The `alt` attribute MUST be present. Its value SHOULD be the `body` of the emote, or if unavailable, its shortcode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear, if any of these MUST requirements on the attribute are not adhered to, a receiving client should refuse to render them?
| | `m.image_pack` state event type | `im.ponies.room_emotes` | | ||
| | `m.image_pack` account data event type | `im.ponies.user_emotes` | | ||
| | `m.image_pack.rooms` account data event type | `im.ponies.emote_rooms` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m.image_pack.rooms state event?
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
|
||
| ### Sending | ||
|
|
||
| #### Custom emotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just reword the entire paragraph to be more like the previous specification. Additionally, alt and title are not required in HTML. They are for providing accessibility or tooltips. We should prioritize rendering something over completeness.
| #### Custom emotes | |
| #### Custom emotes | |
| Custom emotes are sent into rooms as `<img>` tags within the `formatted_body` | |
| field of an | |
| [`m.room.message`](https://spec.matrix.org/v1.14/client-server-api/#mroommessage) | |
| event. Many existing clients already render `<img>` tags in message bodies when | |
| they render HTML, and it's logical to reuse that functionality here. | |
| To allow clients to distinguish custom emotes from other inline images, a new | |
| attribute, `data-mx-emoticon`, is introduced to the `<img>` tag: | |
| ```html | |
| <img data-mx-emoticon src="mxc://example.org/emote" alt="A cat holding a paper heart" title="cat_luvs_u" height="32" /> | |
| ``` | |
| Emote tags are detailed below. Attributes are categorized by their requirement level: | |
| - **Required**: The attribute MUST be present for the tag to be considered a valid custom emote. Clients MUST NOT render invalid custom emotes and MAY fall back to rendering a standard image or ignoring the tag. | |
| - **Recommended**: The attribute SHOULD be present. Its absence does not invalidate the emote but may reduce usability or accessibility. | |
| - **Optional**: The attribute MAY be present. Its absence does not affect validity or core functionality. | |
| - `data-mx-emoticon`: **Required**. A client MUST treat an `<img>` tag as a custom emote ONLY IF the custom | |
| `data-mx-emoticon` attribute is present; otherwise, the client SHOULD treat the tag as an ordinary image. Any values assigned to `data-mx-emoticon` are ignored and clients SHOULD make no assumptions about the existence or structure of any value assigned to the `data-mx-emoticon`. | |
| - `src`: **Required*. The client MUST ignore any image tags without a valid `src` attribute. Clients MUST treat the `src` attribute as valid only if it is in [Matrix Content (MXC) | |
| URI](https://spec.matrix.org/v1.14/client-server-api/#matrix-content-mxc-uris). | |
| Clients MUST treat other URI schemes, such as `https`, `mailto` etc. as invalid. Clients MUST NOT attempt to render images accessible through other URI schemes, as this may lead to unintended network requests. Clients MUST sanitize valid URIs before rendering. | |
| - `title` *Recommended.* This attribute's meaning is inherited from the [HTML specification](https://html.spec.whatwg.org/multipage/dom.html#attr-title). It is primarily used for advisory information like tooltips. Its value SHOULD be the shortcode of the emote. If not present, a tooltip will not be shown. | |
| - `height`: **Recommended.**. Clients SHOULD set height to "32px" . If the height attribute is missing, clients SHOULD assume a default height of 32px. This is a default intended to look good on most devices, and is for the benefit of legacy clients that do not treat custom emotes differently from other inline images. Clients implementing this MSC SHOULD ignore this value for rendering purposes. They are expected to override the height and render emotes based on their own UI requirements (e.g., the user's font size, displaying messages containing only emoticons in a larger font, etc.). | |
| - `width`: **Optional**. This attribute is not required. Clients SHOULD maintain the aspect ratio of non-square emotes. | |
| - `alt` **Recommended**. The `alt` attribute's meaning is inherited from the HTML specification: | |
| <https://html.spec.whatwg.org/multipage/images.html#alt>. It is primarily used | |
| for accessibility purposes in describing an image for visually impaired users. Its value SHOULD be the `body` of the emote, or if unavailable, its shortcode. If the alt attribute is not present, clients SHOULD fall back to the title. If the title attribute is not present, clients SHOULD assign their own default alt text. |
|
Happy to help push this to get to FCP. I think we're getting close. Added a few comments above. What still needs to be resolved? Pinging @anoadragon453 for pointers. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@anoadragon453 any news on this? |
|
I have opened MSC4377 that builds on top of this MSC to clarify the ordering of image packs within a source (to allow for users to reorder packs as they wish). |
Maybe one day we can pass this MSC to our grandkids. Btw this joke isnt intended to disrespect all the great contributors to this giant MSC. Thanks anoa and any other contributors for pushing this along. |
Resolves the thread matrix-org#2545 (comment)
Co-authored-by: Kim Brose <2803622+HarHarLinks@users.noreply.github.com>
Rendered
Signed-off-by: Sorunome mail@sorunome.de
Author: @Sorunome
Shepherd: @anoadragon453
Future MSCs
Current implementations
Emote rendering (rendering of the
<img>tag)data-mx-emoticon)data-mx-emoticon)Sending, using the mentioned events here
Bridges
data-mx-emoticononly)Implementation PRs
FluffyChat
Data model: https://gitlab.com/famedly/company/frontend/libraries/matrix_api_lite/-/merge_requests/26
SDK: https://gitlab.com/famedly/company/frontend/famedlysdk/-/merge_requests/726
Emoticons: https://gitlab.com/famedly/fluffychat/-/merge_requests/433
Stickers: https://gitlab.com/famedly/fluffychat/-/merge_requests/452
Nheko
Stickers: Nheko-Reborn/nheko#648
Sticker editor: Nheko-Reborn/nheko#669
Choosing emoticons: Nheko-Reborn/nheko@ea6b19b
Cinny
Emoticons and Stickers: cinnyapp/cinny#686
kazv
Creating and sending stickers: https://lily-is.land/kazv/kazv/-/merge_requests/71
SCT Stuff:
FCP tickyboxes
MSC Checklist
Status: "At risk" due to unaddressed concerns for a long period of time.
Next steps: The MSC will remain held in its current state until the comments are resolved, or another MSC replaces this proposal by moving more easily through the process.