Skip to content

[API] Non-English (unicode, extended ASCII, emoji) Characters Cannot Be Slug-ified in Tags #71

Closed
@BethanyG

Description

@BethanyG

During the review of PR #69, @billglover uncovered a bug with tags that include Non-English characters (Unicode, extended ASCII, emoji): #69 (comment)

In addition to not respecting the characters, the resulting strings fed to the serializer screw up the ordering and return of the tag:slug pairs, and are saved into the DB in garbled form.

On the surface, this appeared to be a relatively straightforward problem: have Django or taggit respect the non-english characters. It is, however a rather complex situation.

Original PR filed for taggit
Second PR filed for taggit
linked issue (still open)
linked proposal for dealing with the issue (still open)
additional open issue blocking the above


Since we cannot rely on django-taggit to come up with an "easy" solution to this (nor do we have the forms and front end code complications that team does), we'll have to find a way to apprehend the strings and translate/transliterate or otherwise format them in a way that they can be slugified appropriately.

This may mean using an alternative slugify process, or altering/patching the taggit code. Here is a link to the taggit code that saves tags to the DB, as a place to start.

Not super-clear on the path forward. First thoughts are

  1. Apprehend the strings ahead of taggit, and translate or transliterate them (not optimal unless we can re-translate them for appropriate display)

  2. Alter taggit code to slugify the strings in a way that can be stored in the DB and retrieved in their original unicode form, since we do not have the restriction of making them nice for URLs, etc.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions