Description
During the review of PR #69, @billglover uncovered a bug with tags
that include Non-English characters (Unicode, extended ASCII, emoji): #69 (comment)
In addition to not respecting the characters, the resulting strings fed to the serializer screw up the ordering and return of the tag:slug
pairs, and are saved into the DB in garbled form.
On the surface, this appeared to be a relatively straightforward problem: have Django or taggit respect the non-english characters. It is, however a rather complex situation.
Original PR filed for taggit
Second PR filed for taggit
linked issue (still open)
linked proposal for dealing with the issue (still open)
additional open issue blocking the above
Since we cannot rely on django-taggit
to come up with an "easy" solution to this (nor do we have the forms and front end code complications that team does), we'll have to find a way to apprehend the strings and translate/transliterate or otherwise format them in a way that they can be slugified appropriately.
This may mean using an alternative slugify
process, or altering/patching the taggit
code. Here is a link to the taggit
code that saves tags to the DB, as a place to start.
Not super-clear on the path forward. First thoughts are
-
Apprehend the strings ahead of
taggit
, and translate or transliterate them (not optimal unless we can re-translate them for appropriate display) -
Alter
taggit
code to slugify the strings in a way that can be stored in the DB and retrieved in their original unicode form, since we do not have the restriction of making them nice for URLs, etc.