Skip to content

Replace escaped Unicode chars (\u20ac) in stored JSON? #173

@jimallman

Description

@jimallman

While chasing a Unicode-related bug, I realized that our stored JSON (on GitHub) has ugly escaped Unicode characters, e.g. in this study and this tree collection.

These Unicode characters are handled gracefully in our indexing and web apps, but these escape sequences aren't strictly needed as we store all JSON as utf-8. Meanwhile, they're hideous and make it hard to read and search the stored files on GitHub.

  • Is this something we want or need to fix?
  • Would this fix apply to all document types (studies, tree collections, tax. amendments)?
  • Are there other clients or use cases that would be broken by this change?

If we want to restore pretty Unicode for data saved in the future, it seems to all boil down to a single call to json.dump in peyotl that's used for all JSON docs. If we add ensure_ascii=False to this call as shown here, it should save Unicode characters directly (sans escape) in phylesystem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions