Skip to content

Comments

Handle UTF-8 output for reports#335

Open
mmaunder wants to merge 2 commits intomainfrom
gh-327
Open

Handle UTF-8 output for reports#335
mmaunder wants to merge 2 commits intomainfrom
gh-327

Conversation

@mmaunder
Copy link
Contributor

@mmaunder mmaunder commented Sep 23, 2025

Fixes #327.

Summary

  • open report output files with an explicit UTF-8 encoding to support non-ASCII characters
  • add a regression test that verifies the UTF-8 encoding flag is passed to the file handle

Testing

  • ./venv/bin/python -m unittest discover -s wordfence -t .
  • ./venv/bin/python -m flake8 --exclude venv --require-plugins pycodestyle,flake8-bugbear

Copy link
Contributor

@akenion akenion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure simply switching the report encoding to UTF-8 resolves this. The matched string from scanned files is included, scanned files aren't guaranteed to be UTF-8, and not all binary sequences are valid UTF-8, so I think we need to explore a different solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error: 'ascii' codec can't encode character '\U0001f4a1' in position 55: ordinal not in range(128)

2 participants