-
Notifications
You must be signed in to change notification settings - Fork 80
replace printable for try/except utf-8 #2255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #2255 +/- ##
==========================================
+ Coverage 92.81% 92.82% +<.01%
==========================================
Files 171 171
Lines 18925 18930 +5
==========================================
+ Hits 17566 17571 +5
Misses 1359 1359
Continue to review full report at Codecov.
|
u'Test Sample\x962', 'Test Sample 2') | ||
qdb.metadata_template.util.load_template_to_dataframe( | ||
StringIO(replace)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing something obvious, this test is missing a "test". Is "not erroring" the test? If so, are there any checks that can be done on the returned value from load_template_to_dataframe
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is that it not raises an error, AKA that it can be done test_load_template_to_dataframe_non_utf8_error
has the test for the raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, thanks! 👍
qiita_db/metadata_template/util.py
Outdated
if len(block) != len(tblock): | ||
tblock = ''.join([c if c in printable else '🐾' | ||
for c in block]) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire try/except block can be replaced by:
try:
tblock = block.encode('utf-8')
except UnicodeDecodeError:
tblock = unicode(block, errors='replace')
tblock = tblock.replace(u'\ufffd', '🐾')
if tblock not in errors:
errors[tblock] = []
errors[tblock].append('(%d, %d)' % (row, col))
Also, if errors is initializes as a defaultdict(list)
:
try:
tblock = block.encode('utf-8')
except UnicodeDecodeError:
tblock = unicode(block, errors='replace')
tblock = tblock.replace(u'\ufffd', '🐾')
errors[tblock].append('(%d, %d)' % (row, col))
The character u'\ufffd'
is the official unicode character to replace a character that can't be decoded. The call to replace replaces it with our "qiita" paws.
No description provided.