Skip to content

Problems with UTF-8 support for Windows  #48

@shawnanctil

Description

@shawnanctil

I may have missed something, but I have a series of files that are encoded UTF-8, but when I run the tool I get all sorts of ASCII characters in my topics (i.e. - "â", etc). I'm wondering if there's a stage in the processing where files are converted to ASCII and then not re-encoded? I could be way off base with this question.

That said, I have gone through my files, ensured they are UTF-8, and done a find and replace for "â" in all the files. If you've come across this issue before I would love to know how you resolved it. At this point I'm thinking of creating an elaborate stop list that excludes common ASCII characters.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions