-
Notifications
You must be signed in to change notification settings - Fork 21
Closed
Labels
Description
I may have missed something, but I have a series of files that are encoded UTF-8, but when I run the tool I get all sorts of ASCII characters in my topics (i.e. - "â", etc). I'm wondering if there's a stage in the processing where files are converted to ASCII and then not re-encoded? I could be way off base with this question.
That said, I have gone through my files, ensured they are UTF-8, and done a find and replace for "â" in all the files. If you've come across this issue before I would love to know how you resolved it. At this point I'm thinking of creating an elaborate stop list that excludes common ASCII characters.