Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search returns [BADCHARSET (US-ASCII)] when using on Outlook IMAP server #808

Closed
michelevirgilio opened this issue Jan 24, 2019 · 35 comments
Labels
question A question about how to do something

Comments

@michelevirgilio
Copy link

Hi,
i'm trying to make this search on an hotmail IMAP account:

C: C00000014 UID SEARCH CHARSET UTF-8 NOT UID 18017 SUBJECT {59+}
C: Il tuo account OneDrive verrà eliminato in data 23/02/2019
S: C00000014 NO [BADCHARSET (US-ASCII)] The specified charset is not supported.

@jstedfast jstedfast added the question A question about how to do something label Jan 24, 2019
@jstedfast
Copy link
Owner

What that error is telling you is that Hotmail only supports US-ASCII strings, so you will not be able to search for that string.

@michelevirgilio
Copy link
Author

What that error is telling you is that Hotmail only supports US-ASCII strings, so you will not be able to search for that string.

It seems that the US-ASCII is the offending charset; moreover i have no way to specify a charset when invoking folder.Search(). It seems that is automatically set as UTF-8 in BuildQueryExpression() and BuildQuery() methods of ImapFolderSearch.cs.

I think that the user must be able to search for arbitrary text..

@jstedfast
Copy link
Owner

No, you are misreading the error. The BADCHARSET response-code provides a list of supported charsets, not the charset that is invalid.

The charset that is invalid is the charset used in the SEARCH command (which was UTF-8).

You CANNOT search for the string you are trying to search for because the server does not support any charset other than US-ASCII and the string you are trying to searching for is not US-ASCII.

@michelevirgilio
Copy link
Author

michelevirgilio commented Jan 24, 2019

Ok you are right, but this requires that i beforehand change the text passed to the Search() method; is not there a way to force the charset parameter?

@jstedfast
Copy link
Owner

What would you force it to?

@michelevirgilio
Copy link
Author

What would you force it to?

Avoiding CHARSET UTF-8 in the
C: C00000014 UID SEARCH CHARSET UTF-8 NOT UID 18017 SUBJECT {59+}

@jstedfast
Copy link
Owner

That won't work.

@wartab
Copy link

wartab commented Jan 28, 2019

Funnily enough I encountered the same issue a few days prior to this. Seems like Exchange servers don't like UTF-8 at all.

It is also worth noting that some servers who are able to interpret UTF-8 queries will still fail when calling imap.EnableUTF8(). I'm not sure what the purpose of that method is. So checking for failure on it isn't a good technique.

My solution was to build a more complex search expression using And- and Or-Operators by checking for substrings surrounded by non Ascii characters.

Microsoft ftw

@jstedfast
Copy link
Owner

@wartab the UTF-8=ONLY and UTF-8=ACCEPT extensions (which is what EnableUtf8() enables) allow the client and server to do the following things:

  1. use UTF-8 for mailbox names instead of having to encode them in modified UTF-7 encoding
  2. FETCH responses including things like ENVELOPE will be decoded to UTF-8 as opposed to rfc2047 encoding
  3. All strings in a SEARCH command are assumed to be in UTF-8 and the client should not issue a CHARSET parameter to the search.

I'll double-check to make sure that MailKit conforms to that last bit (but I think it does).

I don't think Exchange servers advertise the UTF8=ACCEPT or =ONLY extensions, so that's probably why EnableUTF8() failed for you?

@wartab
Copy link

wartab commented Jan 28, 2019

It was not an Exchange server. If I remember correctly it happened on an OVH mail server (the exchange server threw on both EnableUTF8() and when the search query was sent). The method EnableUTF8() threw an exception but the Search didn't fail and actually sent the correct result. If you want more details, I can provide them.

@jstedfast
Copy link
Owner

EnableUTF8() should only throw if the server doesn't support the UTF8 extension (which is different from supporting UTF-8 in SEARCH - a server can support UTF-8 in SEARCH w/o supporting the ENABLE UTF8 command).

@wartab
Copy link

wartab commented Jan 28, 2019

Fair enough, thanks for the info :)

@JobaDiniz
Copy link

Well, sorry, but I did not understand how one can search with IMAP in Outlook for e-mails with Subjects like Homologação, which is a word in portuguese.

The outlook server returns BADCHARSET, ok, but how can I search using such word? Do I convert the string into the specified encoding returned by the server?

@jstedfast
Copy link
Owner

@JobaDiniz what is the full text of the response from the Exchange server? Does it only list US-ASCII as an available charset? What is the SEARCH command that Outlook uses?

@jstedfast
Copy link
Owner

When you search for "Homologação", maybe outlook flattens that text into ASCII? e.g. "Homologacao"?

If so, you could modify your code to search for that.

@JobaDiniz
Copy link

JobaDiniz commented Sep 19, 2019

Searching for Comunicação (communication in English)

Connected to imaps://outlook.office365.com:993/
S: * OK The Microsoft Exchange IMAP4 service is ready. 
C: A00000000 CAPABILITY
S: * CAPABILITY IMAP4 IMAP4rev1 AUTH=PLAIN AUTH=XOAUTH2 SASL-IR UIDPLUS MOVE ID UNSELECT CHILDREN IDLE NAMESPACE LITERAL+
S: A00000000 OK CAPABILITY completed.
S: A00000001 OK AUTHENTICATE completed.
C: A00000002 CAPABILITY
S: * CAPABILITY IMAP4 IMAP4rev1 AUTH=PLAIN AUTH=XOAUTH2 SASL-IR UIDPLUS MOVE ID UNSELECT CLIENTACCESSRULES CLIENTNETWORKPRESENCELOCATION BACKENDAUTHENTICATE CHILDREN IDLE NAMESPACE LITERAL+
S: A00000002 OK CAPABILITY completed.
C: A00000003 NAMESPACE
S: * NAMESPACE (("" "/")) NIL NIL
S: A00000003 OK NAMESPACE completed.
C: A00000004 LIST "" "INBOX"
S: * LIST (\Marked \HasNoChildren) "/" INBOX
S: A00000004 OK LIST completed.
C: A00000005 LIST "" "%"
S: * LIST (\HasNoChildren) "/" Archive
S: * LIST (\HasNoChildren) "/" Archived
S: * LIST (\HasChildren) "/" Calendar
S: * LIST (\HasChildren) "/" Contacts
S: * LIST (\HasChildren) "/" "Conversation History"
S: * LIST (\HasNoChildren \Trash) "/" "Deleted Items"
S: * LIST (\HasNoChildren \Drafts) "/" Drafts
S: * LIST (\Marked \HasNoChildren) "/" INBOX
S: * LIST (\HasNoChildren) "/" Journal
S: * LIST (\HasNoChildren \Junk) "/" "Junk Email"
S: * LIST (\HasNoChildren) "/" Notes
S: * LIST (\HasNoChildren) "/" Outbox
S: * LIST (\HasNoChildren) "/" Rascunhos
S: * LIST (\HasNoChildren \Sent) "/" "Sent Items"
S: * LIST (\HasNoChildren) "/" Tasks
S: A00000005 OK LIST completed.
C: A00000006 LIST "" "%"
S: * LIST (\HasNoChildren) "/" Archive
S: * LIST (\HasNoChildren) "/" Archived
S: * LIST (\HasChildren) "/" Calendar
S: * LIST (\HasChildren) "/" Contacts
S: * LIST (\HasChildren) "/" "Conversation History"
S: * LIST (\HasNoChildren \Trash) "/" "Deleted Items"
S: * LIST (\HasNoChildren \Drafts) "/" Drafts
S: * LIST (\Marked \HasNoChildren) "/" INBOX
S: * LIST (\HasNoChildren) "/" Journal
S: * LIST (\HasNoChildren \Junk) "/" "Junk Email"
S: * LIST (\HasNoChildren) "/" Notes
S: * LIST (\HasNoChildren) "/" Outbox
S: * LIST (\HasNoChildren) "/" Rascunhos
S: * LIST (\HasNoChildren \Sent) "/" "Sent Items"
S: * LIST (\HasNoChildren) "/" Tasks
S: A00000006 OK LIST completed.
C: A00000007 NOOP
S: A00000007 OK NOOP completed.
C: A00000008 SELECT INBOX
S: * 24 EXISTS
S: * 24 RECENT
S: * FLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)
S: * OK [PERMANENTFLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)] Permanent flags
S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1705] The next unique identifier value
S: A00000008 OK [READ-WRITE] SELECT completed.
C: A00000009 UID SEARCH CHARSET UTF-8 SUBJECT {13+}
C: Comunicação
S: A00000009 NO [BADCHARSET (US-ASCII)] The specified charset is not supported.
C: A00000010 LOGOUT

S: A00000009 NO [BADCHARSET (US-ASCII)] The specified charset is not supported.

Searching for Comunicacao throws no error but nothing is returned.
So, from my understanding, there is no solution for this issue and the problem is not "ours" - it is a limitation of the server we are connecting

Is that it?

@jstedfast
Copy link
Owner

jstedfast commented Sep 19, 2019

Try this:

var inbox = (ImapFolder) client.Inbox;
var results = inbox.Search ("CHARSET US-ASCII SUBJECT \"Comunicação\"");

if that doesn't work, try:

var inbox = (ImapFolder) client.Inbox;
var results = inbox.Search ("CHARSET US-ASCII SUBJECT {13+}\r\nComunicação");

Edit:

and if those fail, try without the CHARSET US-ASCII?

If you can find a work-around by constructing the query string manually and let me know what the final solution is, I'll look into adding a workaround for Exchange SEARCH.

A quick hack to see if any of these work would be:

var inbox = (ImapFolder) client.Inbox;

inbox.Search ("CHARSET US-ASCII SUBJECT \"Comunicação\"");
inbox.Search ("CHARSET US-ASCII SUBJECT {13+}\r\nComunicação");
inbox.Search ("SUBJECT \"Comunicação\"");
inbox.Search ("SUBJECT {13+}\r\nComunicação");

Then just look at the resulting log to see which (if any) query worked successfully.

If none of them work, then yes, it would seem that it is a limitation of the server that you are connecting to.

@JobaDiniz
Copy link

static void Main(string[] args)
        {
        start:
            Console.WriteLine("1 - CHARSET US-ASCII SUBJECT \"Comunicação\"");
            Console.WriteLine("2 - CHARSET US-ASCII SUBJECT {13+}\r\nComunicação");
            Console.WriteLine("3 - SUBJECT \"Comunicação\"");
            Console.WriteLine("4 - SUBJECT {13+}\r\nComunicação");
            Console.WriteLine("5 - EXIT");

            var option = Console.ReadLine();
            try
            {
                if (option == "5")
                    Environment.Exit(0);

                using (var client = new ImapClient(CreateProtocolLogger($"imap{option}.log")))
                {
                    client.Connect("outlook.office365.com", 993, true);
                    client.Authenticate("xxxxx@xxxxx", "xxxxxxxx");
                    var inbox = (ImapFolder)client.Inbox;
                    inbox.Open(FolderAccess.ReadOnly);

                    if (option == "1")
                        inbox.Search("CHARSET US-ASCII SUBJECT \"Comunicação\"");
                    else if (option == "2")
                        inbox.Search("CHARSET US-ASCII SUBJECT {13+}\r\nComunicação");
                    else if (option == "3")
                        inbox.Search("SUBJECT \"Comunicação\"");
                    else if (option == "4")
                        inbox.Search("SUBJECT {13+}\r\nComunicação");
                }

                goto start;
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex);
                goto start;
            }
        }

        static IProtocolLogger CreateProtocolLogger(string fileName)
        {
            var folder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), "XXXXX");
            var stream = new FileStream(Path.Combine(folder, fileName), FileMode.Create, FileAccess.ReadWrite, FileShare.Read);
            return new ProtocolLogger(stream, leaveOpen: false);
        }

First log - BAD Command Error

S: * OK [UNSEEN 27] Is the first unseen message
S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1723] The next unique identifier value
S: A00000006 OK [READ-ONLY] EXAMINE completed.
C: A00000007 UID SEARCH CHARSET US-ASCII SUBJECT "Comunicação"
S: A00000007 BAD Command Error. 11

Second log - socketexception System.IO.IOException: Unable to read data from the transport connection

S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1724] The next unique identifier value
S: A00000006 OK [READ-ONLY] EXAMINE completed.
C: A00000007 UID SEARCH CHARSET US-ASCII SUBJECT {13+}
C: Comunicação

Third log - BAD Command Error. 11

S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1723] The next unique identifier value
S: A00000006 OK [READ-ONLY] EXAMINE completed.
C: A00000007 UID SEARCH SUBJECT "Comunicação"
S: A00000007 BAD Command Error. 11

Fourth log - socketexception System.IO.IOException: Unable to read data from the transport connection

S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1723] The next unique identifier value
S: B00000006 OK [READ-ONLY] EXAMINE completed.
C: B00000007 UID SEARCH SUBJECT {13+}
C: Comunicação

@jstedfast
Copy link
Owner

Ok, so it turns out there was a bug in the ImapFolder.Search (string query, ...) logic that broke when the query string contained unicode characters.

This is what caused the IOExceptions. It may have also affected the other 2 commands.

Basically, what was happening is that the query string was assumed to be ascii and so Comunicação was being encoded as 11 bytes instead of 13, thereby causing the server to wait for 2 more bytes that MailKit never sent and then MailKit thought it had sent everything and waited for the server to reply.

If you could grab the latest (2.3.1.8) nuget package from https://www.myget.org/feed/mimekit/package/nuget/MailKit and try again, that would be appreciated.

Thanks!

@JobaDiniz
Copy link

I updated and no exception was thrown but the search returned no results - and I assure you, there is a e-mail with Comunicação in the subject.

S: * OK [UNSEEN 28] Is the first unseen message
S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1750] The next unique identifier value
S: A00000005 OK [READ-ONLY] EXAMINE completed.
C: A00000006 UID SEARCH CHARSET US-ASCII SUBJECT {13+}
C: Comunicação
S: * SEARCH
S: A00000006 OK SEARCH completed.

Another thing:
My application was built "ages" ago using 1.8.1.1 version of MailKit.
Is there a document where I can find what breaking changes were introduced from 1.8.1.1 to 2.3.x?

@jstedfast
Copy link
Owner

I believe you that there is an e-mail with Comunicação in the subject.

Yea, there's a ReleaseNotes.md file you can check.

Are you saying that 1.8.1.1 was able to search for this string successfully? Can you get a log of the command that it sent? I have a hard time believing that 1.8.1.1 worked.

@jstedfast
Copy link
Owner

Sorry, I just realized that you are probably saying that your app (not the little test app above) is using MailKit 1.8.1.1 and that you are planning to update it to use the latest MailKit (2.3.1.+).

In that case, the ReleaseNotes.md is still a good place to look, but here's a little blurb that I've been including in the NuGet.org release notes that sums up the things to look out for:

MailKit API Changes Since 2.0.x:

  • Obsoleted SearchQuery.HasCustomFlags() and SearchQuery.DoesNotHaveCustomFlags(). These are now SearchQuery.HasKeywords() and SearchQuery.NotKeywords(), respectively.
  • Obsoleted SearchQuery.DoesNotHaveFlags() in favor of SearchQuery.NotFlags().
  • Obsoleted the IMessageSummary.UserFlags property in favor of IMessageSummary.Keywords.
  • Obsoleted the MessageFlagsChangedEventArgs.UserFlags property in favor of MessageFlagsChangedEventArgs.Keywords.
  • All IMailFolder.Fetch and IMailFolder.FetchAsync methods that took a HashSet<string> userFlags argument now take an IEnumerable<string> keywords argument. Note: this only affects you if your code used named method parameters (e.g. userFlags: myUserFlags).

Note to users upgrading from MailKit 1.x:

In order to authenticate using the XOAUTH2 SASL mechanism, you must now use the following approach:

client.Authenticate (new SaslMechanismOAuth2 (username, auth_token));

As far as MimeKit goes, MimeKit has its own ReleaseNotes.md file. Pay closest attention to the notes for 2.0.0 (same for MailKit, really).

jstedfast added a commit that referenced this issue Sep 21, 2019
This change will also *force* unicode strings into US-ASCII when
if that is the only supported charset in an effort to work around
the limitations of Exchange IMAP.

See https://stackoverflow.com/questions/12691913/imap-search-charset-with-iso-8859-1
for an example of what Thunderbird does when forcing unicode into iso-8859-1.

*May* fix #808
@jstedfast
Copy link
Owner

jstedfast commented Sep 21, 2019

@JobaDiniz I've committed a potential fix based on https://stackoverflow.com/questions/12691913/imap-search-charset-with-iso-8859-1

If you could grab the MailKit v2.3.1.11 (or later) package from https://www.myget.org/feed/mimekit/package/nuget/MailKit and test it out, that would be fantastic.

@jstedfast
Copy link
Owner

Essentially what it does is two-fold:

  1. The very first time you do a SEARCH, it will attempt to search using UTF-8 because it has no way of knowing that Exchange doesn't support UTF-8 in SEARCH... I do have a "QuirksMode" state in the ImapEngine so I could hard-code the fact that Exchange doesn't support UTF-8, but I don't want to do that because a future version of Exchange might add support for it.
  2. Once a SEARCH fails with BADCHARSET, the internal list of supported charsets will be updated and as long as the CHARSET parameter in the SEARCH wasn't ASCII, it will retry.
  3. When retrying with US-ASCII, unicode characters will be flattened to US-ASCII like the example in the stackoverflow question.

jstedfast added a commit that referenced this issue Sep 21, 2019
This change will also *force* unicode strings into US-ASCII when
if that is the only supported charset in an effort to work around
the limitations of Exchange IMAP.

See https://stackoverflow.com/questions/12691913/imap-search-charset-with-iso-8859-1
for an example of what Thunderbird does when forcing unicode into iso-8859-1.

*May* fix #808
@jstedfast
Copy link
Owner

@JobaDiniz did my above fix work for you?

@JobaDiniz
Copy link

JobaDiniz commented Sep 23, 2019 via email

@JobaDiniz
Copy link

JobaDiniz commented Sep 23, 2019

It didn't work, using version 2.3.1.15

S: * NAMESPACE (("" "/")) NIL NIL
S: B00000003 OK NAMESPACE completed.
C: B00000004 LIST "" "INBOX"
S: * LIST (\Marked \HasNoChildren) "/" INBOX
S: B00000004 OK LIST completed.
C: B00000005 EXAMINE INBOX
S: * 38 EXISTS
S: * 14 RECENT
S: * FLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)
S: * OK [PERMANENTFLAGS ()] Permanent flags
S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1816] The next unique identifier value
S: B00000005 OK [READ-ONLY] EXAMINE completed.
C: B00000006 UID SEARCH CHARSET US-ASCII SUBJECT {13+}
C: Comunicação
S: * SEARCH
S: B00000006 OK SEARCH completed.

image

@jstedfast
Copy link
Owner

jstedfast commented Sep 23, 2019

Sorry, I mean you should try by using inbox.Search (SearchQuery.SubjectContains ("Comunicação"));

@JobaDiniz
Copy link

JobaDiniz commented Sep 23, 2019

Using SearchQuery class instead of raw query - it didn't work either, no e-mails were found

S: * 38 EXISTS
S: * 14 RECENT
S: * FLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)
S: * OK [PERMANENTFLAGS ()] Permanent flags
S: * OK [UIDVALIDITY 14] UIDVALIDITY value
S: * OK [UIDNEXT 1816] The next unique identifier value
S: A00000005 OK [READ-ONLY] EXAMINE completed.
C: A00000006 UID SEARCH CHARSET UTF-8 SUBJECT {13+}
C: Comunicação
S: A00000006 NO [BADCHARSET (US-ASCII)] The specified charset is not supported.
C: A00000007 UID SEARCH SUBJECT {11+}
C: Comunica褯
S: * SEARCH
S: A00000007 OK SEARCH completed.

@jstedfast
Copy link
Owner

I'm going to have to throw in the towel and conclude that there's just no way to do this with Exchange :-\

@JobaDiniz
Copy link

Yeah... that's bummer... it isn't only an issue of C# developers: https://stackoverflow.com/a/55977511/1830639

@jmehrens
Copy link

jmehrens commented Apr 1, 2024

Per, RFC6855:

The IMAP base specification [RFC3501] forbids the use of 8-bit
characters in atoms or quoted-strings. Thus, a UTF-8 string can only
be sent as a literal.

and

Once an IMAP client has enabled UTF-8 support with the "ENABLE
UTF8=ACCEPT" command, it MUST NOT issue a "SEARCH" command that
contains a charset specification. If an IMAP server receives such a
"SEARCH" command in that situation, it SHOULD reject the command with
a "BAD" response (due to the conflicting charset labels).

Looking at the debug output from "SEP 19 2023" there was no attempt that showed up as:

A00000006 UID SEARCH SUBJECT {13+}
C: Comunicação

The first attempt sent a charset which is wrong. The second attempt, looks like it used the wrong charset to generate the literal so it will not find a match. It should have used UTF-8 to generate bytes and omitted the charset from the search command.

@mfkvfn
Copy link

mfkvfn commented Apr 1, 2024

I found that this dose not work only when I use outlook email.
If I switch to other email(mail.qq.com mail.163.com,...) I can search it again.
my ocde is

title = title.encode('UTF-8')
_typ, _search_data = inbox.search(None, 'SUBJECT', f'"{title}"')

@jstedfast
Copy link
Owner

Office365 & Exchange only support US-ASCII as a charset for search.

@jmehrens
Copy link

jmehrens commented Apr 7, 2024

Office365 & Exchange only support US-ASCII as a charset for search.

Thanks for the information. For completeness, I tested a patch with what I proposed above and it simply doesn't find a match due to as you stated only supporting US-ASCII in search and not advertising UTF-8=ACCEPT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A question about how to do something
Projects
None yet
Development

No branches or pull requests

6 participants