Skip to content

Conversation

@ianwjhalliday
Copy link
Collaborator

Fixes #1429

Chevron characters appears in the source comments but the compiler on
Chinese Windows 10 believes the code page for the source file cannot
represent these characters.

Fix by converting them to '<<' and '>>'

Fixes chakra-core#1429

Chevron characters appears in the source comments but the compiler on
Chinese Windows 10 believes the code page for the source file cannot
represent these characters.

Fix by converting them to '<<' and '>>'
@ianwjhalliday
Copy link
Collaborator Author

@dotnet-bot test Windows x64_release please

@bterlson
Copy link
Contributor

Should also probably document this practice somewhere as pasting in spec text is fairly common... is it possible to teach the compiler on Chinese Windows 10 about utf-8?

@ianwjhalliday
Copy link
Collaborator Author

@bterlson from what I've read it sounds like the compiler should handle utf-8, however I've also come across some stackoverflow posts where people still seemed to have trouble with utf-8 files fed to msvc.

Here I'm erring on the side of caution but you're right that this is a common practice. I sent a psa mail to the dev team suggesting we be mindful when pasting text into comments. Maybe we can go one step further and have a validation gate in jenkins. I'll discuss with @dilijev.

@chakrabot chakrabot merged commit bb6999d into chakra-core:master Aug 16, 2016
chakrabot pushed a commit that referenced this pull request Aug 16, 2016
…VS 2015

Merge pull request #1440 from ianwjhalliday:fix1429

Fixes #1429

Chevron characters appears in the source comments but the compiler on
Chinese Windows 10 believes the code page for the source file cannot
represent these characters.

Fix by converting them to '<<' and '>>'
@dilijev
Copy link
Contributor

dilijev commented Aug 16, 2016

@ianwjhalliday @bterlson If the file is properly encoded in UTF8 with BOM would MSVC on any language of Windows be able to handle it? I ask because our two options here are:

  1. Use a gate to ban non-ASCII (i.e. 7-bit with leading 0) characters from the source
  2. Convert all source files to UTF8 with BOM and use a gate to enforce that encoding

@bterlson
Copy link
Contributor

I would hardly call a properly encoded UTF-8 file one that includes a BOM. The BOM is useless for UTF-8 and the spec says usage of a BOM with UTF-8 encoded text is not recommended. But, option 2 is clearly the superior option here.

@dilijev
Copy link
Contributor

dilijev commented Aug 16, 2016

@bterlson I should clarify that I meant for "properly encoded in UTF8" (as opposed to extended ASCII) and "file with a BOM" as separate concerns. I just think the BOM is enough of a disambiguator and a convenient way to check encodings without making the gate scan every byte of the file.

FWIW I've always preferred UTF8 without BOM (BOMs are annoying to have in my source files) and it bugs me when I see files get checked in with a BOM. I'm just trying to work out something we could easily enforce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants