Skip to content

Allow certain command line settings to be additive #19938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ See docs/process.md for more on how version tagging works.

3.1.45 (in development)
-----------------------
- Command line settings that accept lists are add to existing occurances
of the setting rather than replacing them. For example, specifying
`-sEXPORTED_FUNCTIONS=foo -sEXPORTED_FUNCTIONS=bar` is now equivalent to
`-sEXPORTED_FUNCTIONS=foo,bar`. This is useful in build systems where
separate components each want to contribute to this list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little risky to me, as we have some settings that we want people to be able to replace/shrink, like INCOMING_MODULE_JS_API. How about this notation instead?

-sEXPORTED_FUNCTIONS+=foo

That is, += appends rather than assigns/replaces.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone ever wants one command line argument to replace/shrink the same argument already specified on the same line.

In the case of INCOMING_MODULE_JS_API we want the command line arg to replace the default set, but if one library wants to set -sINCOMING_MODULE_JS_API=foo and another sets -sINCOMING_MODULE_JS_API=bar I'm pretty sure we always want them to be additive with the result being foo,bar.

The way I have set it up is that this happens on a per-setting basis so it doesn't have to apply to all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. INCOMING_MODULE_JS_API wouldn't be used from a library anyhow, I guess, that's a global setting for an application, so you wouldn't apply this to that setting. While it does make sense for exported functions, so you'd apply it there. Which I see is what this PR does.

Still, what do you think about using += notation? I think that is less ambiguous. As a side benefit, it would avoid any backwards compatibility risk whatsoever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just can think of a situation where you wouldn't want += to be the default, and it add more complexity to the command line parser. If we ever find an option where we don't want it to be the default perhaps we could add it then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what worries me is that this is a breaking change. I do see your point that this is the more natural behavior, but it can break user's builds if they had the same flag twice. Build systems are weird and sometimes that duplication happens (in Binaryen for whatever reason there is more than one -Ox flag, for example...).

Less concerning but still a little odd is that some settings would have this behavior and others wouldn't. And we already have some settings that are additive in their name, like EXTRA_*. So this feels like it could be more consistent with those somehow, but I'm not sure how. Perhaps EXTRA_EXPORTED_FUNCTIONS would be additive, for example?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a source of bugs when using bazel: cc_library sets some flags for it's functionality (say, requests ccall). The other library / cc_binary wants to access filesystem, so adds FS. And for some reason it doesn't work and there is no warning why it's happening. To find the problem I literary had to understand how emcc flags and build system works, and even then there is no solution. (Actually, there is one: abuse EXTRA_EXPORTED_FUNCTIONS because while the flag is deprecated, it's passed through and appended in the end. But my colleague uses it too in his library, so no way for me to set custom flags at all).

But I agree that it's a breaking change, so my proposal is to add a warning that overriding is happening and perhaps offer a flag to turn additive functionality on. Something like this:

  for s in settings_changes:
    key, value = s.split('=', 1)
    key, value = normalize_boolean_setting(key, value)
    if key in user_settings and user_settings[key] != value:
      diagnostics.warning('unused-command-line-argument', f'{key} was provided multiple times with different values. Previous value: {user_settings[key]}. New value: {value}.')
    user_settings[key] = value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding in #21464

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. INCOMING_MODULE_JS_API wouldn't be used from a library anyhow, I guess, that's a global setting for an application, so you wouldn't apply this to that setting. While it does make sense for exported functions, so you'd apply it there. Which I see is what this PR does.

Why not? It seems quite natural to have a library (say using inline JS) declare what module.js API functions it uses... and then to propagate that up. In a Bazel world, libraries are responsible for declaring their dependencies, and a used API feels like a dependency.

Still, what do you think about using += notation? I think that is less ambiguous. As a side benefit, it would avoid any backwards compatibility risk whatsoever.

I don't know of any prior art of command-line flags behaving like this. It definitely does not feel ergonomic.
While I agree some folks may be relying on this behavior, I'd argue that the old behavior was unnatural. Yes, some flags (like -O) only expect a single value, and last value wins. But anything that takes a list really ought to accumulate. (It would be weird if, e.g. -l didn't accumulate for instance).

Maybe allow a synthetic, implicit prefix for each list flag, like ADD_ e.g. ADD_EXPORTED_FUNCTIONS.
If it's a list, it also supports this thing. Then at least build systems that want to leverage this can.

Copy link
Collaborator Author

@sbc100 sbc100 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It seems quite natural to have a library (say using inline JS) declare what module.js API functions it uses... and then to propagate that up. In a Bazel world, libraries are responsible for declaring their dependencies, and a used API feels like a dependency.

Normally its not user code that reads things from INCOMING_MODULE_JS_API but emscripten's own code JS code. Its not really something that was designed to be extensible.

The usage for INCOMING_MODULE_JS_API is really about what the HTML surrounding the program is going to provide. Normally a library doesn't know that, only the person building the final binary. The person in charge of linking the program would ask themselves "am I going to ever provide such as paramater to the program".. if the answer is no, then then there is no need to add that to the INCOMING_MODULE_JS_API list. Adding something to that list, but never supplying it at runtime is just a waste of code.

I can imagine there might be exceptions.. but that seems like the rule to me.


3.1.44 - 07/25/23
-----------------
Expand Down
39 changes: 21 additions & 18 deletions emcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
from tools import webassembly
from tools import config
from tools import cache
from tools.settings import user_settings, settings, MEM_SIZE_SETTINGS, COMPILE_TIME_SETTINGS
from tools.settings import user_settings, settings, MEM_SIZE_SETTINGS, COMPILE_TIME_SETTINGS, APPENDING_SETTINGS
from tools.utils import read_file, write_file, read_binary, delete_file, removeprefix

logger = logging.getLogger('emcc')
Expand Down Expand Up @@ -417,15 +417,15 @@ def default_setting(name, new_default):
setattr(settings, name, new_default)


def apply_user_settings():
def apply_setting(cmdline_settings):
"""Take a map of users settings {NAME: VALUE} and apply them to the global
settings object.
"""

# Stash a copy of all available incoming APIs before the user can potentially override it
settings.ALL_INCOMING_MODULE_JS_API = settings.INCOMING_MODULE_JS_API + EXTRA_INCOMING_JS_API

for key, value in user_settings.items():
for key, value in cmdline_settings:
if key in settings.internal_settings:
exit_with_error('%s is an internal setting and cannot be set from command line', key)

Expand Down Expand Up @@ -459,6 +459,10 @@ def apply_user_settings():
except Exception as e:
exit_with_error('a problem occurred in evaluating the content after a "-s", specifically "%s=%s": %s', key, value, str(e))

if key in APPENDING_SETTINGS:
value += getattr(settings, key)

user_settings[user_key] = value
setattr(settings, user_key, value)

if key == 'EXPORTED_FUNCTIONS':
Expand Down Expand Up @@ -1426,23 +1430,22 @@ def phase_parse_arguments(state):
explicit_settings_changes, newargs = parse_s_args(newargs)
settings_changes += explicit_settings_changes

cmdline_settings = []
for s in settings_changes:
key, value = s.split('=', 1)
key, value = normalize_boolean_setting(key, value)
user_settings[key] = value

# STRICT is used when applying settings so it needs to be applied first before
# calling `apply_user_settings`.
strict_cmdline = user_settings.get('STRICT')
if strict_cmdline:
settings.STRICT = int(strict_cmdline)
# STRICT is used when applying settings so it needs to be applied first before
# calling `apply_setting`.
if key == 'STRICT' and value:
settings.STRICT = int(value)
cmdline_settings.append((key, value))

# Apply user -jsD settings
for s in user_js_defines:
settings[s[0]] = s[1]

# Apply -s settings in newargs here (after optimization levels, so they can override them)
apply_user_settings()
apply_setting(cmdline_settings)

return options, newargs

Expand Down Expand Up @@ -1629,7 +1632,7 @@ def phase_setup(options, state, newargs):
# If we get here then the user specified both DISABLE_EXCEPTION_CATCHING and EXCEPTION_CATCHING_ALLOWED
# on the command line. This is no longer valid so report either an error or a warning (for
# backwards compat with the old `DISABLE_EXCEPTION_CATCHING=2`
if user_settings['DISABLE_EXCEPTION_CATCHING'] in ('0', '2'):
if user_settings['DISABLE_EXCEPTION_CATCHING'] in (0, 2):
diagnostics.warning('deprecated', 'DISABLE_EXCEPTION_CATCHING=X is no longer needed when specifying EXCEPTION_CATCHING_ALLOWED')
else:
exit_with_error('DISABLE_EXCEPTION_CATCHING and EXCEPTION_CATCHING_ALLOWED are mutually exclusive')
Expand All @@ -1638,9 +1641,9 @@ def phase_setup(options, state, newargs):
settings.DISABLE_EXCEPTION_CATCHING = 0

if settings.WASM_EXCEPTIONS:
if user_settings.get('DISABLE_EXCEPTION_CATCHING') == '0':
if user_settings.get('DISABLE_EXCEPTION_CATCHING') == 0:
exit_with_error('DISABLE_EXCEPTION_CATCHING=0 is not compatible with -fwasm-exceptions')
if user_settings.get('DISABLE_EXCEPTION_THROWING') == '0':
if user_settings.get('DISABLE_EXCEPTION_THROWING') == 0:
exit_with_error('DISABLE_EXCEPTION_THROWING=0 is not compatible with -fwasm-exceptions')
# -fwasm-exceptions takes care of enabling them, so users aren't supposed to
# pass them explicitly, regardless of their values
Expand All @@ -1649,7 +1652,7 @@ def phase_setup(options, state, newargs):
settings.DISABLE_EXCEPTION_CATCHING = 1
settings.DISABLE_EXCEPTION_THROWING = 1

if user_settings.get('ASYNCIFY') == '1':
if user_settings.get('ASYNCIFY') == 1:
diagnostics.warning('emcc', 'ASYNCIFY=1 is not compatible with -fwasm-exceptions. Parts of the program that mix ASYNCIFY and exceptions will not compile.')

if user_settings.get('SUPPORT_LONGJMP') == 'emscripten':
Expand All @@ -1672,11 +1675,11 @@ def phase_setup(options, state, newargs):
# Wasm SjLj cannot be used with Emscripten EH. We error out if
# DISABLE_EXCEPTION_THROWING=0 is explicitly requested by the user;
# otherwise we disable it here.
if user_settings.get('DISABLE_EXCEPTION_THROWING') == '0':
if user_settings.get('DISABLE_EXCEPTION_THROWING') == 0:
exit_with_error('SUPPORT_LONGJMP=wasm cannot be used with DISABLE_EXCEPTION_THROWING=0')
# We error out for DISABLE_EXCEPTION_CATCHING=0, because it is 1 by default
# and this can be 0 only if the user specifies so.
if user_settings.get('DISABLE_EXCEPTION_CATCHING') == '0':
if user_settings.get('DISABLE_EXCEPTION_CATCHING') == 0:
exit_with_error('SUPPORT_LONGJMP=wasm cannot be used with DISABLE_EXCEPTION_CATCHING=0')
default_setting('DISABLE_EXCEPTION_THROWING', 1)

Expand Down Expand Up @@ -1991,7 +1994,7 @@ def phase_linker_setup(options, state, newargs):

# For users that opt out of WARN_ON_UNDEFINED_SYMBOLS we assume they also
# want to opt out of ERROR_ON_UNDEFINED_SYMBOLS.
if user_settings.get('WARN_ON_UNDEFINED_SYMBOLS') == '0':
if user_settings.get('WARN_ON_UNDEFINED_SYMBOLS') == 0:
default_setting('ERROR_ON_UNDEFINED_SYMBOLS', 0)

# It is unlikely that developers targeting "native web" APIs with MINIMAL_RUNTIME need
Expand Down
21 changes: 21 additions & 0 deletions test/test_other.py
Original file line number Diff line number Diff line change
Expand Up @@ -13635,3 +13635,24 @@ def test_memory64_proxies(self):
'-Wno-experimental',
'--extern-post-js', test_file('other/test_memory64_proxies.js')])
self.run_js('a.out.js')

def test_settings_append(self):
create_file('pre.js', '''
Module.onRuntimeInitialized = () => {
_foo();
}
''')
create_file('test.c', r'''
#include <stdio.h>

void foo() {
printf("foo\n");
}

int main() {
printf("main\n");
return 0;
}
''')
expected = 'foo\nmain\n'
self.do_runf('test.c', expected, emcc_args=['--pre-js=pre.js', '-sEXPORTED_FUNCTIONS=_foo', '-sEXPORTED_FUNCTIONS=_main'])
14 changes: 14 additions & 0 deletions tools/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,20 @@
'RUNTIME_LINKED_LIBS',
}.union(PORTS_SETTINGS)

# Settings for which repeated occurances add to a list rather then replacing
# the current one.
APPENDING_SETTINGS = {
'EXPORTED_FUNCTIONS',
'DEFAULT_LIBRARY_FUNCS_TO_INCLUDE',
'EXPORTED_RUNTIME_METHODS',
'SIGNATURE_CONVERSIONS',
'EXCEPTION_CATCHING_ALLOWED',
'ASYNCIFY_IMPORTS',
'ASYNCIFY_REMOVE',
'ASYNCIFY_ADD',
'ASYNCIFY_ONLY',
'ASYNCIFY_EXPORTS',
}

# Settings that don't need to be externalized when serializing to json because they
# are not used by the JS compiler.
Expand Down