Explain that from Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support. #841

Coises · 2025-11-22T00:26:07Z

Add that beginning with Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support. Explain why this was done, what happens when Notepad++ opens what users think of as an “ANSI” file, how to determine the Windows setting from the Debug Info and where to find the setting in Windows.

Explain that from Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support.

Coises · 2025-11-22T00:29:44Z

Feel free to make this more concise if you can figure out how to do that. It feels long and wordy to me, and like it shouldn’t really be a whole sub-sub-sub-section of its own, but nothing else I tried was any better.

donho

@pryrt
What's your opinion?

content/docs/preferences.md

pryrt · 2025-11-23T20:20:29Z

@pryrt What's your opinion?

After getting feedback from my two comments above (ie, assuming you agreed that creating a new Encoding section outside of the preferences page was a good idea), I would hijack this PR (or cancel this one and start my own) to do a bigger rework to create a new section for detailed Encoding documentation.

(I've previously created new sections in the manual without "permission", when I thought it was best for the Manual. But since you're involved in this discussion now, I want to make sure we're on the same page before I move forward.)

donho · 2025-11-23T22:27:50Z

@pryrt
OK, It's your PR now.

Coises · 2025-11-24T18:52:45Z

@pryrt If there’s anything I can do that would help, let me know.

pryrt · 2025-11-25T03:02:46Z

I've split out most of the Encoding docs into the new encoding.md,
I also clarified that the ANSI setting isn't always 8-bit, as per @Coises's side note

@Coises, please make sure I didn't miss anything that was discussed earlier, or mis-explain things in any of my rewording of your text. Thanks.

Coises · 2025-11-25T04:16:46Z

@pryrt

If MISC > Autodetect character encoding is enabled, Notepad++ will attempt to algorithmically determine the encoding of the file. If the file you open is encoded in UTF-16 (which always has the BOM character), or in UTF-8 with the BOM, then Notepad++ will use the encoding based on the BOM. If the file is an XML file, then if the encoding is defined in the declaration/prolog, Notepad++ will use that encoding for the file. Failing that, Notepad++ will also analyze some of the byte sequences in the file, and if they match patterns common to UTF-8 or one of the character sets, then Notepad++ will use that encoding.

If autodetection is not enabled, or if autodetection does not yield a positive result, Notepad++ will choose the encoding based on the system locale.

This is not quite correct. I think only @donho knows exactly how this works, so hopefully he will correct any inaccuracies below.

I understand that this level of detail is inappropriate for the manual. I just do not know how to be both accurate and user-friendly. This is maddeningly complicated.

First, and regardless of the Autodetect setting, Notepad++ checks for a byte order mark. If there is one, it is taken as definitive and no further questions are asked. (If I’m remembering correctly, even though a file in a legacy 8-bit encoding legitimately can include any sequence of bytes, including one that looks like a byte order mark, there is no way at all to get Notepad++ to interpret a file that begins with a byte order mark as anything but the Unicode format corresponding to that byte order mark. Attempts to change it using the Encoding menu will be ignored. Fortunately this is almost never a practical problem.)

I think the test for encoding defined in XML (and HTML?) files occurs next, also regardless of the Autodetect setting. I am uncertain as to whether the user can override this with an Encoding menu selection. And I think there is some logic to make this use ANSI (and not a character set sub-menu entry) if the character set identified is the system code page... but I don’t know the details.

Then, if and only if Autodetect is checked, there is a heuristic test to see if the file is likely to be one of a number of specific code page encodings. (I do not know the scope of that test, except that based on the discussion Don and I had while he worked on Issue #17057, one of the things it cannot successfully recognize is Windows-1252.)

If all that fails to determine an encoding — once again, regardless of the Autodetect setting — a test is made to see if the file appears to be all ASCII, valid UTF-8 (but not all ASCII), or neither.

If it is ASCII, then if the system code page is 65001 or if the New Documents setting Apply to opened ANSI files is checked, it is opened as UTF-8; else it is opened as ANSI.

If it is not pure ASCII and it is valid UTF-8, it is opened as UTF-8.

If it is not pure ASCII and it is not valid UTF-8, then if the system code page is 65001, it is opened using the entry on the Encoding > Character sets sub-menus for the legacy code page corresponding to the system locale; otherwise it is opened as ANSI.

Coises · 2025-11-25T05:04:05Z

@pryrt

Since any explanation of how encoding detection works is bound to make most people’s eyes glaze over (unless you can work some magic that’s beyond me), I wonder if something should be added in the “Encoding and Use Unicode UTF-8 for worldwide language support” section to note that:

Notepad++ can still open files in the legacy encoding for your system’s locale (so-called “ANSI”) when Use Unicode UTF-8 for worldwide language support is enabled; but it will open them using a selection from the Encoding > Character sets sub-menus instead of ANSI.

I suspect is isn’t worth going into the small ways in which that makes a difference; e.g.: positions and lengths of highlighted text won’t necessarily correspond to the byte positions and lengths in the file on disk, and the length of the document in the editor will be different (longer) than the length of the file, unless the document contains only ASCII characters; it is possible to paste or otherwise enter into the document characters not in the character set, and they will look like they were inserted successfully until the file is saved and then opened again; searches will be in UTF-8 rather than ANSI, which affects \x values over 7f and character ranges ([x-y]) that include non-ASCII characters; Plugins > Converter > ASCII -> HEX will show UTF-8 bytes rather than the bytes that appear in the file; probably other things that haven’t occurred to me.

If there is a good place to say it, it might be worth clarifying that a file opened with any selection from the Encoding > Character sets sub-menus is always converted to UTF-8 on loading and back to the specified character set when saving; it is never edited directly in the selected character set.

pryrt · 2025-11-25T17:50:58Z

@Coises,

Okay, I moved the "if option" to only apply to the heuristic portion, and made sure my order follows yours, without trying to get too far into the nitty-gritty details. I also made the brief comment about internal representation.

Coises · 2025-11-25T18:39:13Z

@pryrt:

Okay, I moved the "if option" to only apply to the heuristic portion, and made sure my order follows yours, without trying to get too far into the nitty-gritty details. I also made the brief comment about internal representation.

I think the “Encoding Auto-Detection” and “Encoding and Use Unicode UTF-8 for worldwide language support” sections are very good now. Accurate, as far as I can tell, yet readable.

One problem with “Encoding During Editing”:

It should be clarified: when Notepad++ reads the file, it actually converts the file from whatever encoding it is on the disk, and internally uses the UTF-8 encoding when doing editing and searching – it’s just during file-read and file-write that the file’s encoding is utilized.

This is true except when the Encoding is ANSI. When a file is recognized as ANSI, it is loaded into Scintilla and edited in that encoding. (Perhaps counter-intuitively, when a file is loaded using an option from the Character sets sub-menus, even if it is the same code page as the system default code page, the file is converted to UTF-8 and loaded that way.)

So:

If encoding detection/selection results in ANSI, the file is loaded, unchanged, into Scintilla and the document is interpreted in the system code page encoding. (As of 8.8.8, this cannot happen when Use Unicode UTF-8 for worldwide language support is enabled. Before 8.8.8 that combination resulted in erratic behavior.)
If encoding detection/selection results in UTF-8, the file is loaded, unchanged, into Scintilla and the document is interpreted as UTF-8.
If encoding detection results in UTF-8 with BOM, the first three bytes (the BOM) are skipped, the remainder of the file is loaded, unchanged, into Scintilla, and the document is interpreted as UTF-8.
In all other cases, the file is converted from the detected or selected encoding to UTF-8, the converted text is loaded into Scintilla, and the document is interpreted as UTF-8.

Similar considerations apply when file encoding is changed (i.e., when an Encoding menu Convert option is used, or when an Encoding is selected for a new tab that has never been saved).

I’ll let you figure out how to make that comprehensible to normal human beings, as you are clearly better at that than I am.

pryrt · 2025-11-25T20:45:29Z

I’ll let you figure out how to make that comprehensible to normal human beings

/me hopes he got it this time

Coises · 2025-11-25T21:38:06Z

@pryrt:

If I may, I’d like to submit an alternative to the first paragraph under “Encoding During Editing.” Use, ignore or synthesize as you think best. For:

It should be clarified: when Notepad++ reads the file, it usually converts the file from whatever encoding it is on the disk and may use a different encoding internally – it is just during file-read and file-write that the file’s real encoding is utilized. For the internal encoding, it will use the system encoding internally if Notepad++ determines the file encoding is “ANSI”, but not if one of the specific character sets is chosen; if Notepad++ is not set to “ANSI”, it will use UTF-8 encoding internally. (The same is true when you change the encoding from whatever was originally chosen.)

consider:

Notepad++ does not always let you edit a document in the same encoding used to store it in its file. Most of the time this is a technicality that won’t matter to you, but it is good to be aware of the details. When the encoding (shown in the Encoding menu and in [status bar](#status-bar) area 5) is ANSI or UTF-8, you are editing the document in the same encoding as the file. In all other cases (UTF-16 or anything from the Character sets sub-menus), you are editing the document as UTF-8, and Notepad++ converts from or to the chosen encoding when opening or saving the file.

(I question “usually” in the original paragraph because in practice, most of the files most people open, if they don’t have the new Windows Unicode option enabled, will be edited without any conversion, since most are going to be either ANSI or UTF-8.)

(I know “it is good to be aware of the details” begs the question, “Why is it good? Why would I care?”; but I think the answer to that is too geeky for general consumption.)

I think the next paragraph, about BOMs, is great as it is.

pryrt · 2025-11-25T23:37:02Z

I removed your second sentence and replaced it with a parenthetical showing when it does matter (plugins like HexEdit plugin get confused by the internal representation, though I didn't call out any plugin by name) -- but I've now switched it to mostly your paragraph.

Coises · 2025-11-26T00:01:59Z

@pryrt:

I removed your second sentence and replaced it with a parenthetical showing when it does matter (plugins like HexEdit plugin get confused by the internal representation, though I didn't call out any plugin by name) -- but I've now switched it to mostly your paragraph.

That’s good. I like it.

I mucked up the link for “status bar” and you copied my mistake. It’s in a different file, so I guess it has to be something like ../user-interface/index.html#status-bar — however that gets done in markdown+Hugo. Sorry about that; I wrote without testing.

pryrt · 2025-11-26T00:52:26Z

I should've verified the link before the last commit. Confirmed it's working now.

And since I did that, I audited the other links on the new page, because many needed to update to pointing to the preferences page. So that's been fixed, too.

I'll probably let things sit, and look it all over again tomorrow. If I don't see anything else, and you have no other comments, I'll probably publish tomorrow.

Coises · 2025-11-26T00:59:15Z

@pryrt:

I'll probably let things sit, and look it all over again tomorrow. If I don't see anything else, and you have no other comments, I'll probably publish tomorrow.

I think I’ve run out of things to complain about. ;-)

Thanks for all your work on this — I think it will be much more helpful to users now than my initial changes would have been.

alankilborn · 2025-11-26T01:08:02Z

content/docs/encoding.md

+
+As of Notepad++ version 8.8.8, the **ANSI** and **Convert to ANSI** entries on the **Encoding** menu are disabled when the Windows setting **Use Unicode UTF-8 for worldwide language support** is enabled. When that setting is in effect, the system default code page, which ordinarily defines “ANSI” in Windows, *is* UTF-8; attempting to treat UTF-8 as an ordinary code page does not work properly, which caused erratic behavior prior to version 8.8.8. Since the traditional concept of “ANSI” has no consistent meaning when that Windows setting is enabled, Notepad++ disables `ANSI` encoding.  (But even with that OS option set, Notepad++ can still choose one of the Character Set encodings; it just manually selects that entry, not setting it to "ANSI".)
+
+Some Windows 11 installations are coming with that option turned on by default.  If you need to be able to use the **Convert to ANSI** action, and you find it's disabled in Notepad++ v8.8.8 or newer (or if that conversion doesn't behave as expected on older versions of Notepad++), you can verify in **?**-menu's **Debug Info**: it will show `Current ANSI codepage: 65001` if that Windows OS option is on.  If you want to chance that Windows OS setting, Microsoft provides multiple paths to that setting, but two of the common ways to find it are:


chance in this para should be change

no, I was trying to indicate the risk involved in using Windows OS settings... ;-)

Update preferences.md

8f9f5c1

Explain that from Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support.

donho self-assigned this Nov 22, 2025

donho requested changes Nov 23, 2025

View reviewed changes

content/docs/preferences.md Outdated Show resolved Hide resolved

content/docs/preferences.md Outdated Show resolved Hide resolved

content/docs/preferences.md Outdated Show resolved Hide resolved

donho assigned pryrt and unassigned donho Nov 23, 2025

Split out most Encoding documentation to new page

148307b

Tweak the new encoding page per Coises feedback

d0e3647

Update 'Encoding During Editing' to include the ANSI exception

ae2e455

Use most of Coises most recent suggestion

fe78a90

pryrt added 2 commits November 25, 2025 16:42

Fix status bar link

8b76a09

Fix other links

bd7b9d1

alankilborn reviewed Nov 26, 2025

View reviewed changes

fix 'chance' to 'change' per alankilborn

31e42b0

pryrt merged commit df7a75f into notepad-plus-plus:master Nov 26, 2025
1 check passed


		As of Notepad++ version 8.8.8, the ANSI and Convert to ANSI entries on the Encoding menu are disabled when the Windows setting Use Unicode UTF-8 for worldwide language support is enabled. When that setting is in effect, the system default code page, which ordinarily defines “ANSI” in Windows, is UTF-8; attempting to treat UTF-8 as an ordinary code page does not work properly, which caused erratic behavior prior to version 8.8.8. Since the traditional concept of “ANSI” has no consistent meaning when that Windows setting is enabled, Notepad++ disables `ANSI` encoding. (But even with that OS option set, Notepad++ can still choose one of the Character Set encodings; it just manually selects that entry, not setting it to "ANSI".)

		Some Windows 11 installations are coming with that option turned on by default. If you need to be able to use the Convert to ANSI action, and you find it's disabled in Notepad++ v8.8.8 or newer (or if that conversion doesn't behave as expected on older versions of Notepad++), you can verify in ?-menu's Debug Info: it will show `Current ANSI codepage: 65001` if that Windows OS option is on. If you want to chance that Windows OS setting, Microsoft provides multiple paths to that setting, but two of the common ways to find it are:

Explain that from Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support. #841

Explain that from Notepad++ 8.8.8, ANSI is disabled when Windows is set to Use Unicode UTF-8 for worldwide language support. #841

Uh oh!

Conversation

Coises commented Nov 22, 2025

Uh oh!

Coises commented Nov 22, 2025

Uh oh!

donho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pryrt commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donho commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Coises commented Nov 24, 2025

Uh oh!

pryrt commented Nov 25, 2025

Uh oh!

Coises commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Coises commented Nov 25, 2025

Uh oh!

pryrt commented Nov 25, 2025

Uh oh!

Coises commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pryrt commented Nov 25, 2025

Uh oh!

Coises commented Nov 25, 2025

Uh oh!

pryrt commented Nov 25, 2025

Uh oh!

Coises commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pryrt commented Nov 26, 2025

Uh oh!

Coises commented Nov 26, 2025

Uh oh!

alankilborn Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

pryrt Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pryrt commented Nov 23, 2025 •

edited

Loading

donho commented Nov 23, 2025 •

edited

Loading

Coises commented Nov 25, 2025 •

edited

Loading

Coises commented Nov 25, 2025 •

edited

Loading

Coises commented Nov 26, 2025 •

edited

Loading