Description
Describe the bug
In some cases, the attachments (file)names are not correctly decoded and contain invalid characters. This happens for names encoded like this: ISO-8859-1''caf%E9.txt
. Note that it's not using encoded-words (btw, I cannot find the name of this encoding, do you know it?). The ISO-8859-1
encoding is simply ignored.
Used config
'options' => [
'decoder' => [
'message' => 'iconv',
'attachment' => 'iconv',
],
],
Code to Reproduce
$clientManager = new \Webklex\PHPIMAP\ClientManager();
$clientManager->setConfig([
'options' => [
'decoder' => [
'message' => 'iconv',
'attachment' => 'iconv',
],
],
]);
$email = file_get_contents(__DIR__ . '/email.txt');
$message = \Webklex\PHPIMAP\Message::fromString($email);
foreach ($message->getAttachments() as $attachment) {
$name = $attachment->getName();
echo "Attachment: {$name}\n";
}
You can find an example of problematic email: email.txt (generated with Gnome Evolution).
Expected behavior
The attachment name should be café.txt
, but it is caf�.txt
.
Desktop / Server (please complete the following information):
- OS: Docker image
php:8.1-fpm
(Debian I guess?) - PHP: 8.1
- Version: 5.5.0
- Provider: Gnome Evolution
Additional context
I was able to spot the issue.
In Attachment::decodeName
, you test that $name
contains the string ''
and get the "real" name from it, but you drop the encoding. In my example, ISO-8859-1''caf%E9.txt
becomes caf%E9.txt
.
Few lines later, you urldecode()
the name. Unfortunately, in my case, %E9
is ISO-8859-1 for the character é
, while it would be %C3%A9
in UTF-8. Meaning that we still need to convert the string from ISO-8859-1 to UTF-8 with EncodingAliases::convert($name, $encoding)
($encoding
being $parts[0]
extracted earlier).