during tokenize, use UTF8 encoding on all platforms #510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(This is a PR-as-issue, but if I've guessed the wrong solution please feel free to close or suggest a better fix.)
An MNE-Python user who was trying to build our docs on Windows hit this error today:
re-running after
export PYTHONUTF8=1
resolved the issue, so I think explicitly invokingutf-8
during read should also prevent the error without requiring any user action.Technically I think this is a backwards-incompatible change for windows users who had any non-ASCII characters in their source files if those characters are encoded differently in utf-8 than they are in the system's default codepage (which will vary with OS language settings). However,
PYTHONUTF8=1
will become the effective default in 2025 with the release of Python 3.15 (so affected users will need to address this change eventually anyway).