Skip to content

replace printable for try/except utf-8 #2255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 29, 2017

Conversation

antgonza
Copy link
Member

No description provided.

@codecov-io
Copy link

codecov-io commented Aug 28, 2017

Codecov Report

Merging #2255 into dev will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #2255      +/-   ##
==========================================
+ Coverage   92.81%   92.82%   +<.01%     
==========================================
  Files         171      171              
  Lines       18925    18930       +5     
==========================================
+ Hits        17566    17571       +5     
  Misses       1359     1359
Impacted Files Coverage Δ
qiita_db/metadata_template/test/test_util.py 99.49% <100%> (+0.01%) ⬆️
qiita_db/metadata_template/util.py 90.97% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f94af55...e195e07. Read the comment docs.

@antgonza antgonza changed the base branch from master to dev August 28, 2017 19:07
u'Test Sample\x962', 'Test Sample 2')
qdb.metadata_template.util.load_template_to_dataframe(
StringIO(replace))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something obvious, this test is missing a "test". Is "not erroring" the test? If so, are there any checks that can be done on the returned value from load_template_to_dataframe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is that it not raises an error, AKA that it can be done test_load_template_to_dataframe_non_utf8_error has the test for the raise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, thanks! 👍

if len(block) != len(tblock):
tblock = ''.join([c if c in printable else '&#128062;'
for c in block])
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire try/except block can be replaced by:

try:
    tblock = block.encode('utf-8')
except UnicodeDecodeError:
    tblock = unicode(block, errors='replace')
    tblock = tblock.replace(u'\ufffd', '&#128062;')
    if tblock not in errors:
        errors[tblock] = []
    errors[tblock].append('(%d, %d)' % (row, col))

Also, if errors is initializes as a defaultdict(list):

try:
    tblock = block.encode('utf-8')
except UnicodeDecodeError:
    tblock = unicode(block, errors='replace')
    tblock = tblock.replace(u'\ufffd', '&#128062;')
    errors[tblock].append('(%d, %d)' % (row, col))

The character u'\ufffd' is the official unicode character to replace a character that can't be decoded. The call to replace replaces it with our "qiita" paws.

@josenavas josenavas merged commit ef4f566 into qiita-spots:dev Aug 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants