Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypt/Decrypt Mailbox urls #198

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

lologf
Copy link

@lologf lologf commented Jan 18, 2019

No description provided.

@lologf
Copy link
Author

lologf commented Jan 18, 2019

This fix works in python 2.7, django 1.11 and database with charset utf8 and collation ut8_general_ci

if 'subject' in message:
msg.subject = (
utils.convert_header_to_unicode(message['subject'])[0:255]
utils.convert_header_to_unicode(unicode(message['subject']).decode('utf-8'))[0:255]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather surprising change; could you elaborate on how this helps, exactly?

Copy link
Author

@lologf lologf Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working in a app with python 2.7, django 1.11 and production database in 'utf8' charset. And i need to use django-mailbox to receive emails. If an email have a 'emoji' in subject, Django return a OperationalError. I should not change the character set to 'utf8mb4' in production. This fix (I don't know another way to do it, in utils.convert_header_to_unicode perhaps?) allow receive emails with emojis in django 1.11, python 2.7 and utf8 charset and collation

Before this fix: Django return a OperationalError
After this fix: Email subject with unicode emojis: "Resume of your a\xc3\xb1o with \xf0\x9f\x9a\x80"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I understand that you believe that this fixes the issue you're encountering, but what I meant was, specifically, how does the above change help that, really; consider this:

There are two possibilities here; one is that message['subject'] is a unicode object and the other is that it's bytes; given your example emoji of 🚀, that means we have two possibilities:

If it's bytes:

value = unicode('\xf0\x9f\x9a\x80')
# Will raise the following exception:
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
# UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)

If it's unicode:

value = unicode(u'\U0001f680')
# Now let's try running 'decode'
value.decode('utf-8')
# Will raise the following exception:
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/var/www/envs/latestrevision/lib/python2.7/encodings/utf_8.py", line 16, in decode
#    return codecs.utf_8_decode(input, errors, True)
# UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f680' in position 0: ordinal # not in range(128)

There are a couple things to be learned from the above:

  • Using unicode without supplying an encoding to use will attempt to interpret the provided string using your default encoding (sys.getdefaultencoding()). In most peoples' cases, that encoding is going to be ascii, and that is certainly not going to work for codepoints above 127.
  • decode is intended to be used for converting bytes into unicode objects -- not for converting unicode objects into anything at all -- so when you run decode on a unicode object, you're actually asking python to re-interpret your object into your default encoding, then to decode those bytes using the encoding you've selected. This is also not going to help you get the result you want, but is one of the more common misunderstandings of how unicode and bytes objects work in Python.

Copy link
Author

@lologf lologf Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. If I explain how I got to this point we can better understand the solution to the problem.

I use "(message ['subject]).decode(' utf-8 ')" to force the utf-8 encoding, which is the encoding that I have configured by default in my django app and in my production database.

I thought that the variable 'DJANGO_MAILBOX_default_charset' contained in utils.get_settings () could help me, but I saw that being lowercase django does not detect it as settings. I made a fix to capitalize it and force the 'default_charset' to be utf-8, but it still gave the same OperationalError.

I read several articles where they indicated that I had to change all the tables and columns of the production databases to 'utf8mb4', since the 'emojis' use 4 bytes to represent it in unicode.

But I can not change that encoding in my production database and I do not care that the emoji is represented as bytes in the subject.

My intention is to use django-mailbox to automate actions when receiving emails, and I do not care that the emoji is not represented correctly. What I want is that django does not return an OperationalError if I do not have the encoding to 'utf8mb4'.

I understand that this conversion from header to unicode should be done by the function utils.covert_header_to_unicode(), but I made the fix in _models.Mailbox.process_message() as workaround.

When making the decode, it returns a string "'=?Utf-8?Bxxxxxxxxxxxx ...'" which is a MIME header. This string is converted to a readable string with "email.header.decode_header (msg.subject)".

And at this point my question is, is there any way to use django-mailbox without the encoding 'utf8mb4' in the production database if i received an email with a "emoji"?. Thanks for everything

@pfouque pfouque changed the title Fix default_charset and _process_message() to accept emojis in utf-8 encode Encrypt/Decrypt Mailbox urls Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants