Skip to content

Fix -gsource-map with UTF-8 chars in directory and file names in the path. #24560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025

Conversation

juj
Copy link
Collaborator

@juj juj commented Jun 13, 2025

Fix -gsource-map with UTF-8 chars in directory and file names in the path. Fixes #24557.

@juj juj force-pushed the fix_utf8_source_map branch from 4bb8915 to 6606b8e Compare June 13, 2025 17:24
with open(options.output, 'w') as outfile:
json.dump(map, outfile, separators=(',', ':'))
with open(options.output, 'w', encoding='utf-8') as outfile:
json.dump(map, outfile, separators=(',', ':'), ensure_ascii=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ensure_ascii=False means that non-ascii chars are not escaped as they normally would be. Is that what we want for source maps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is probably the simplest. JSON readily supports UTF-8 characters in content.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that the resulting source maps actually work correctly in browsers? (as opposed to just having the test pass).

@juj
Copy link
Collaborator Author

juj commented Jun 13, 2025

Yes, it does look like UTF-8 in source maps work:

Screenshot 2025-06-13 211102

@juj juj enabled auto-merge (squash) June 13, 2025 18:23
@juj juj merged commit 1577417 into emscripten-core:main Jun 13, 2025
30 checks passed
@dschuff
Copy link
Member

dschuff commented Jun 13, 2025

This one seems to be failing on our windows emscripten-releases bot: https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8712146499624631425/+/u/Emscripten_testsuite__other_/stdout

======================================================================
ERROR: test_wasm_sourcemap_relative_paths (test_other.other)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\chrome-bot\AppData\Local\vpython-root.0\store\cpython+dunqimq8p43kdruh73ujf8fpl8\contents\bin\Lib\unittest\case.py", line 60, in testPartExecutor
    yield
  File "C:\Users\chrome-bot\AppData\Local\vpython-root.0\store\cpython+dunqimq8p43kdruh73ujf8fpl8\contents\bin\Lib\unittest\case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "C:\Users\chrome-bot\AppData\Local\vpython-root.0\store\cpython+dunqimq8p43kdruh73ujf8fpl8\contents\bin\Lib\unittest\case.py", line 633, in _callTestMethod
    method()
  File "C:\b\s\w\ir\x\w\install\emscripten\test\test_other.py", line 10781, in test_wasm_sourcemap_relative_paths
    test('A �\u2603� Z.cpp')
  File "C:\b\s\w\ir\x\w\install\emscripten\test\test_other.py", line 10763, in test
    print(infile, expected_source_map_path)
  File "C:\Users\chrome-bot\AppData\Local\vpython-root.0\store\python_venv-n0ssd7qim3vjmkh5j9ljrgnplo\contents\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2603' in position 3: character maps to <undefined>

Looks like it's trying to print, expecting a cp1252 encoding but it's actually UTF-8?

@juj
Copy link
Collaborator Author

juj commented Jun 14, 2025

That's interesting. I tried to reproduce with the following setups:

  • Windows 11, keeping Unicode support enabled in system Regions menu, resulting in chcp printing codepage 65001.
  • Windows 11, disabling Unicode support in system Regions menu, resulting in chcp printing codepage 437.
  • Windows 11, disabling Unicode support in system Regions menu, explicitly running chcp 1252 before test.

and also on another Windows 10 PC, repeating the same three cases. I thought for sure setting codepage to 1252 and/or disabling Unicode support would reproduce the problem, but that doesn't seem to help.

Reading the line

EMSDK_PYTHON = C:\Users\chrome-bot\AppData\Local\vpython-root.0\store\python_venv-n0ssd7qim3vjmkh5j9ljrgnplo\contents\Scripts\python3.exe

in the log suggests that Python version might be different on the bot vs what I am running with from Emsdk.

I updated Emsdk recently to have Python 3.13.3.

So I wonder if this might be a Python version thing that older version had problems printing Unicode characters? If an older Python version is used there, I wonder if it would be feasible to update? Otherwise I think skipping that one test on the bot for now could be a simple approach as well.

@dschuff
Copy link
Member

dschuff commented Jun 16, 2025

Thanks for digging into this. It wouldn't surprise me if there's an issue with the particular version and configuration of Python and/or the underlying OS on these builders, as they are fairly customized for Chromium's CI, and we are just along for the ride. That also means that it's probably not worth trying to get them to update it; but since it's working on GitHub, and if you're pretty confident that everything is WAI after your investigation here, then I'm comfortable just disabling this test on the Chromium CI.

@juj
Copy link
Collaborator Author

juj commented Jun 16, 2025

@dschuff Would you be able to run python --version on the failing Chromium CI to verify which version it has? I can try to locally install the same version to see if that reproduces the problem.

@dschuff
Copy link
Member

dschuff commented Jun 16, 2025

According to the top line of the log in https://logs.chromium.org/logs/emscripten-releases/buildbucket/cr-buildbucket/8712146499624631425/+/u/Emscripten_testsuite__other_/stdout
the version is
Python version 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]

@dschuff
Copy link
Member

dschuff commented Jun 16, 2025

which... actually I thought they were planning on updating all the builders to 3.11, so maybe ours got left behind somehow. I'll look into that. But maybe it's worth trying to reproduce on whatever the oldest version of Python we want to support in emscripten is. The reason Chrome infra is updating is because 3.8 is EOL, so maybe we want to update our min supported version as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-gsource-map does not work on Windows if username contains non-ASCII Unicode characters.
3 participants