Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-28764: mailbox.mbox: handle lines with non-ascii more graceful #23553

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flokli
Copy link

@flokli flokli commented Nov 29, 2020

Don't fail to parse if non-ascii characters occur after a From: line.

https://bugs.python.org/issue42433

https://bugs.python.org/issue28764

…aceful

Don't fail to parse if non-ascii characters occur after a From: line.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@flokli

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@jstasiak
Copy link
Contributor

jstasiak commented Dec 1, 2020

Two cents from me: if I was using this API I'd not want to it to modify the data being imported silently and in a lossy way so an exception raised (the current behavior) seems superior to me. Even better would be to actually accept non-ascii characters as-is but I'm not sure how feasible that is.

@flokli
Copy link
Author

flokli commented Dec 5, 2020

This was already silently skipping non-ascii things on lines that didn't start with a From - this PR just makes it consistent with the behaviour there. Also see https://bugs.python.org/issue42433#msg382169 why this is even considered a bug for earlier versions.

@jstasiak
Copy link
Contributor

jstasiak commented Dec 6, 2020

I'm totally unconvinced by https://bugs.python.org/issue42433#msg382169, but

This was already silently skipping non-ascii things on lines that didn't start with a From

seems like an argument to me, fair enough.

@github-actions
Copy link

github-actions bot commented Jan 6, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jan 6, 2021
@flokli
Copy link
Author

flokli commented Jan 6, 2021

This isn't stale, but waiting for maintainer feedback (@bitdancer ?)

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Jan 7, 2021
@github-actions
Copy link

github-actions bot commented Feb 7, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 7, 2021
@flokli
Copy link
Author

flokli commented Feb 7, 2021

Still waiting for feedback from maintainers.

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Feb 8, 2021
@CoolCat467
Copy link

CLA signed

@flokli
Copy link
Author

flokli commented Oct 7, 2021

@CoolCat467 is there anything preventing this PR from getting merged? Anything needed to do on my side?

@CoolCat467
Copy link

I think it needs maintainer review yet... Maybe look into adding reviewers in bpo?

@flokli
Copy link
Author

flokli commented Oct 15, 2021

I can't assign people in bpo, but I can cc @maxking and @bitdancer on this PR due to #17620.

@liakoyras
Copy link

Hi. I am having the same problem, I want to parse mbox files that contain international email addresses in the From field.
After some research I ended up here, and I would like to share my opinion.

I think that the premise of using .decode('ascii') is wrong, since email addresses can contain non-ASCII characters. In other words, expecting the text after From to always be ASCII and throwing an error, or just ignoring the other characters as in the case of this PR is not the proper behavior.

@isonno
Copy link

isonno commented Aug 2, 2023

The way this library currently throws the UnicodeDecodeError makes it difficult to properly handle or skip an offending message while iterating over messages in a mailbox. This needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants