Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode::MIME::Header: Rewrite both decoder and encoder #51

Merged
merged 1 commit into from
Mar 22, 2016
Merged

Encode::MIME::Header: Rewrite both decoder and encoder #51

merged 1 commit into from
Mar 22, 2016

Conversation

pali
Copy link
Contributor

@pali pali commented Mar 21, 2016

Encoder should be now fully compliant of RFC 2047. Decoder is less strict
to be able to decode strings generated by old versions of this module.

To enforce correct strict mode, set package variable
$Encode::MIME::Header::STRICT_DECODE to 1.

Encoder should be now fully compliant of RFC 2047. Decoder is less strict
to be able to decode strings generated by old versions of this module.

To enforce correct strict mode, set package variable
$Encode::MIME::Header::STRICT_DECODE to 1.
dankogai added a commit that referenced this pull request Mar 22, 2016
Encode::MIME::Header: Rewrite both decoder and encoder
@dankogai dankogai merged commit a65d124 into dankogai:master Mar 22, 2016
@dankogai
Copy link
Owner

Thank you!

@dracos
Copy link

dracos commented Mar 23, 2016

Thanks so much for this, I was just today wondering about trying to fix some of the issues with Encode::MIME::Header and found this had just already been done :) Looking forward to it being released.

@nwellnhof
Copy link

First of all, thanks for the effort. I am the author of Email::MIME::RFC2047 and invested quite some time trying to make it robust and RFC-conformant. I ran some tests from my module against your code and the only problems I found are the handling of invalid encoded words. Examples:

Encoded words that aren't separated by whitespace like text=?iso-8859-1?q?text?= should not be decoded, but your code does. From the RFC:

Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'.

and

Any message or body part header field defined as '*text', or any user-defined header field, should be parsed as follows: Beginning at the start of the field-body and immediately following each occurrence of 'linear-white-space', each sequence of up to 75 printable characters (not containing any 'linear-white-space') should be examined to see if it is an 'encoded-word' according to the syntax rules in section 2. Any other sequence of printable characters should be treated as ordinary ASCII text.

Then there are encoded words with invalid contents like =?iso-8859-1?b?----?=? My module tries hard to keep such words intact but your code seems to replace them with the empty string. Section 6.3 of the RFC says:

A mail reader need not attempt to display the text associated with an 'encoded-word' that is incorrectly formed. However, a mail reader MUST NOT prevent the display or handling of a message because an 'encoded-word' is incorrectly formed.

This doesn't rule out to completely remove invalid encoded words, but I think it's safer to keep them. Some client code might treat an empty email header as an error condition.

@pali
Copy link
Contributor Author

pali commented Mar 24, 2016

@nwellnhof: local $Encode::MIME::Header::STRICT_DECODE = 1;

@nwellnhof
Copy link

Great, this works. (Encoded words with invalid contents are still stripped, though.)

@pali
Copy link
Contributor Author

pali commented Apr 25, 2016

@nwellnhof How should be correctly decoded string "=?iso-8859-1?b?----?=" ? I think RFC 2047 does not specify it. If you have good idea I can fix it in STRICT_DECODE.

@nwellnhof
Copy link

I would simply leave such strings undecoded.

@pali
Copy link
Contributor Author

pali commented May 12, 2016

Ok, that could make sense in STRICT_DECODE. I will prepare some patches for it...

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Jun 9, 2016
----------------------------------
Revision: 2.84  Date: 2016/04/11 07:17:02
! lib/Encode/MIME/Header.pm
  Pulled: Encode::MIME::Header:
    Update description that this module is only for unstructured header
  dankogai/p5-encode#53
! lib/Encode/MIME/Header.pm t/mime-header.t
  Pulled: Encode::MIME::Header: Fix valid_q_chars, '-' needs to be escaped
  dankogai/p5-encode#52

2.83 2016/03/24 07:49:54
! lib/Encode/MIME/Header.pm t/mime-header.t
  Both decoder and encoder are rewritten by Pali Rohár.
  Encoder should be now fully compliant of RFC 2047.
  Decoder is less strict to be able to decode
  strings generated by old versions of this module.
  dankogai/p5-encode#51
! t/mime-header.t
  Add more test vectors from RFC2047, pp.11-12
! lib/Encode/Supported.pod
  merge: Autrijus -> Audrey
  dankogai/p5-encode#50

2.82 2016/02/06 20:17:24
! lib/Encode/MIME/Header.pm
  lib/Encode/MIME/Header/ISO_2022_JP.pm
  t/mime-header.t
  Reverted to 2.80 upon the request of whom submitted pull/48

2.81 2016/02/06 19:25:22
! lib/Encode/MIME/Header.pm
  lib/Encode/MIME/Header/ISO_2022_JP.pm
  t/mime-header.t
  Merged: Encode::MIME::Header: Fix decoder and rewrite encoder
  > Encoder should be now fully compliant of RFC 2047.
  > Decoder is less strict to be able to decode strings
  > generated by old versions of this module.
  dankogai/p5-encode#48
  ! t/mime-header.t
   merge t/mime-header.t @ https://github.com/asjo/p5-encode
   https://github.com/asjo/p5-encode/commit/19dcbff63e71909ffda7c151a73c5baaffe2976c
  ! t/mime-header.t
    Add more test vectors from RFC2047, pp.11-12
@pali
Copy link
Contributor Author

pali commented Oct 21, 2016

Implemented in #68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants