Skip to content
This repository was archived by the owner on Oct 21, 2025. It is now read-only.

Conversation

@yarikoptic
Copy link

@yarikoptic yarikoptic commented Apr 18, 2022

Original use case was using the rich.inspect within epdb session.
https://pypi.org/project/rich/ is coloring output and uses UTF-8
characters for tables etc. epdb client was crashing unable to
decode:

(Epdb) rich.inspect(self, value=False)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yoh/proj/datalad/datalad-mihextras/venvs/dev3/lib/python3.9/site-packages/epdb/__init__.py", line 1091, in connect
	t.interact()
  File "/home/yoh/proj/datalad/datalad-mihextras/venvs/dev3/lib/python3.9/site-packages/epdb/epdb_client.py", line 146, in interact
	sys.stdout.write(text.decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128)
  • I think hardcoding to ascii is the wrong thing to do since AFAIK
    there is no guarantee that it would not be UTF-8

  • instead of leaving to default (not providing encoding) decided
    to go for prefered encoding

  • there is a code block which follows for reading/writing stdin,
    and I encoded also into preferable encoding which allows now
    to pass unicode to be encoded to utf-8 happen someone needs
    to enter it.

Signed-off-by: Yaroslav Halchenko debian@onerussian.com

demo of it working:
image

…te as necessary

Original use case was using the rich.inspect within epdb session.
https://pypi.org/project/rich/ is coloring output and uses UTF-8
characters for tables etc.  epdb  client was crashing unable to
decode:

	(Epdb) rich.inspect(self, value=False)
	Traceback (most recent call last):
	  File "<string>", line 1, in <module>
	  File "/home/yoh/proj/datalad/datalad-mihextras/venvs/dev3/lib/python3.9/site-packages/epdb/__init__.py", line 1091, in connect
		t.interact()
	  File "/home/yoh/proj/datalad/datalad-mihextras/venvs/dev3/lib/python3.9/site-packages/epdb/epdb_client.py", line 146, in interact
		sys.stdout.write(text.decode('ascii'))
	UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128)

- I think hardcoding to `ascii` is the wrong thing to do since AFAIK
  there is no guarantee that it would be UTF-8

- instead of leaving to default (not providing encoding) decided
  to go for prefered encoding

- there is a code block which follows for reading/writing stdin,
  and I encoded also into preferable encoding which allows now
  to pass unicode to be encoded to utf-8 happen someone needs
  to enter it.

Signed-off-by: Yaroslav Halchenko <debian@onerussian.com>
@wfscheper
Copy link
Contributor

This looks okay to me, but I'd like at least one other reviewer. @mibanescu or @mtharp, what do you think?

@yarikoptic
Copy link
Author

ping @mibanescu @mtharp , your feedback would be appreciated.

remaining_text = b""
except UnicodeDecodeError as e:
unicode_text = all_text[:e.start].decode(encoding)
remaining_text = all_text[e.start:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, i now see i had added my comment on the wrong line. Here, remaining_text gets reinitialized with a slice from all_text. If e.start is 0 and the first char is not a valid Unicode start sequence, then remaining_text will be appended to, but will always fail to decode, which, if I am right, leads to the infinite loop.

Just re-raising the exception if e.start <= 0 should be safe i think

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had to use this patch again to realize that we never finished this PR.

For e.start < 0 - I accepted my own assertion above since I think it should simply never happen.

For e.start == 0 -- I think it is unlikely but legit case whenever we read only few bytes which are beginning of an incomplete unicode. Then we just store it all and append to next block -- I do not see how we could get an infinite loop here since we are not adding any looping and should exit at eof (would still do) ... the only thing would be left is at the end to assert that we have no remaining_text left -- as if incomplete unicode at the end was provided and we failed to decode it. I will add a stab for that.

@yarikoptic
Copy link
Author

ping on this PR -- I keep coming to the need to use patched version. Please let me know if I should improve on anything

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants