Skip to content

fix UTF8 escaping of log messages, retain newlines and tabs #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

boegel
Copy link

@boegel boegel commented May 20, 2014

No description provided.

@boegel
Copy link
Author

boegel commented May 20, 2014

This fixes the broken tests in easybuilders#926

@stdweird
Copy link
Owner

this is just wrong. we'll need another fix if this is all required

@boegel
Copy link
Author

boegel commented May 21, 2014

unless you're OK with 'escaped' newlines and tabs in log messages, it's not required

but without this (and using repr), logs are very unreadable...

@JensTimmerman
Copy link

Agree wit stdweird, it's wrong:

>>> msg  = u'foo"""""'
>>> '\n'.join([repr(line).strip("\"'").replace('\\t', '\t') for line in msg.split('\n')])
"u'foo"

@boegel
Copy link
Author

boegel commented May 21, 2014

@JensTimmerman: OK, it's stripping off too much, but that's another issue
@stdweird's comment is that this is too crazy to be considered a good solution, with the stripping of newlines and replacing of tabs...

@JensTimmerman
Copy link

I would try something like:

try:
    msg = str(msg)
except UnicodeEncodeError:
    msg = msg.encode('utf8', 'replace)

Instead of the repr() call.
This should always return a bytestring

>>> def bla(text):
...  try:
...    text = str(text)
...  except UnicodeEncodeError:
...    text = text.encode(
... 'utf8', 'replace')
...  return text
... 
>>> bla('bla')
'bla'
>>> bla(u'bla')
'bla'
>>> bla(u'bl❤ ☀ a')
'bl\xe2\x9d\xa4 \xe2\x98\x80 a'
>>> bla('bl❤ ☀ a')
'bl\xe2\x9d\xa4 \xe2\x98\x80 a'
>>> bla('bl\xe2\x9d\xa4 \xe2\x98\x80 a')
'bl\xe2\x9d\xa4 \xe2\x98\x80 a'
>>> bla(u'bl\xe2\x9d\xa4 \xe2\x98\x80 a')
'bl\xc3\xa2\xc2\x9d\xc2\xa4 \xc3\xa2\xc2\x98\xc2\x80 a'

@boegel
Copy link
Author

boegel commented May 21, 2014

That plays nice with newlines and tabs in any case, so I'm all for it.

>>> def bla(text):
...  try:
...   text = str(text)
...  except UnicodeEncodeError:
...   text = text.encode('utf8', 'replace')
...  return text
... 
>>> bla('"foo"')
'"foo"'
>>> bla('"foo\nbar"')
'"foo\nbar"'
>>> print bla('"foo\nbar"')
"foo
bar"
>>> bla('"foo\tbaz\nbar"')
'"foo\tbaz\nbar"'
>>> print bla('"foo\tbaz\nbar"')
"foo    baz
bar"

@JensTimmerman
Copy link

here's a list of some of the things that don't work:
https://pythonhosted.org/kitchen/unicode-frustrations.html

@JensTimmerman
Copy link

actually always using msg.encode('utf8', 'replace') should work, but I added the extra try except because it does not work if message is an object or an integer...

@JensTimmerman
Copy link

Note that this does not fail if the string contains UTF-16 characters if python was compiled with wide unicode character support (default since 2.2)
Because surprisingly, they store UTF-8 characters in 4bytes, so UTF-8 actually works as if it was UTF-32
see http://legacy.python.org/dev/peps/pep-0261/

@boegel
Copy link
Author

boegel commented May 21, 2014

changed to use @JensTimmerman's suggestion, EB unit tests are happy

@boegel
Copy link
Author

boegel commented May 21, 2014

I also checked that this resolves the original issue, that was caused by the string Jülich used in the description of easybuilders/easybuild-easyconfigs#759 .
To trigger/test, use something like eb --from-pr 759 --dump-test-report --robot --github-user=boegel --force --debug (the debug is key here)

@boegel
Copy link
Author

boegel commented May 21, 2014

Properly resolved in hpcugent/vsc-base#122, easybuilders#926 is no longer needed (replaced by easybuilders#935), so this PR has become useless.

@boegel boegel closed this May 21, 2014
@boegel boegel deleted the stdweird_develop branch May 21, 2014 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants