Description
Input file for reference:
Rocio.txt
UTF-unknown was unable to detect the correct encoding (Shift JIS), while uchardet did correctly identify it.
This took a while to figure out, but eventually I discovered that it was because of a few lines like these (line 98): victory3 ="Hay que salvar al mundo, ソte uniras a nosotras?".
Mugen character files are a thing of nightmare. I assume that this is a character made by someone Spanish/Brazilian, then edited by someone Japanese.
After some investigation I indeed found that these probers are practically identical to uchardet, but there is a discrepency that caused the results to deviate. Namely, UTF-unknown exits early when it encounters an error, while uchardet simply continues. As far as I could tell, uchardet never exits as a result of a state machine error, in any prober at all. And indeed, upon removing the early exits, I got a correct detection as well.