Fix and reenable bolt -hugify flag

In #586 it was reported that a handful of CPython test cases fail when we use `bolt -hugify`. We resolved it by turning off `hugify` for now.

The underlying cause is that all of these tests are ones that expect _specific file descriptors_, often fd 0, to not be present in a child process, and `bolt -hugify` injects some code at process startup that opens a file descriptor. POSIX behavior is `open()` and friends pick the lowest unused file descriptor, so file 0 is reoccupied, and these tests fail.

In other words, this is not a bug / miscompilation in the BOLT optimization itself. But there are two actual bugs here. First, the CPython test suite is incorrect to assume that other libraries in the child process will not open a file on the lowest free fd and leave it open. Second, the BOLT runtime code should close the file once it's done using it.

The action for this issue is to file both bugs, get at least one of them resolved in the versions we're using, and turn `-hugify` back on.

### CPython test suite bug

The tests in #586 are the following

```
python -m test -v -m test_pass_fds_redirected test_subprocess.py
python -m test -v -m test_no_* test_cmd_line.py
python -m test -v -m test_close_file test_posix.py
```

They all have the same kind of bug. Here's the last of those, which tests that closing a file in a child with `os.posix_spawn(...file_actions=[(os.POSIX_SPAWN_CLOSE, 0)])` works:

```python
    def test_close_file(self):
        closefile = os_helper.TESTFN
        self.addCleanup(os_helper.unlink, closefile)
        script = f"""if 1:
            import os
            try:
                os.fstat(0)
            except OSError as e:
                with open({closefile!r}, 'w', encoding='utf-8') as closefile:
                    closefile.write('is closed %d' % e.errno)
            """
        args = self.python_args('-c', script)
        pid = self.spawn_func(args[0], args, os.environ,
                              file_actions=[(os.POSIX_SPAWN_CLOSE, 0)])

        support.wait_process(pid, exitcode=0)
        with open(closefile, encoding="utf-8") as f:
            self.assertEqual(f.read(), 'is closed %d' % errno.EBADF)
```

That is, it expects `os.fstat(0)` in the child to raise. Because it does not raise, the test fails. (On a side note, the manner in which the test fails is kind of confusing - `closefile` is not created if `os.fstat(0)` succeeds, so the error message from the test failure is not very clear.)

In the general case, the environment surrounding the user code (kernel, libc, etc.) is permitted to open file descriptors on its own. For instance, an NSS module can open a persistent file descriptor to wherever it stores information. When using a binfmt_misc handler, you get file descriptors for the executable and the handler. (This is definitely true when using Rosetta, which I use on my dev VM and made it hard for me to track down this issue. I would guess this is true in general and done by the kernel binfmt_misc implementation.) Because all the interfaces for opening file descriptors use the lowest free one, you can't really test whether a file was successfully closed by seeing if the file descriptor is _still_ unused.

In the specific case, closing and leaving closed fds 0, 1, or 2 (stdin, stdout, and stderr) is a really weird thing to do, and some kernels' `exec` implementation will automatically open /dev/null on these file descriptors to prevent security issues from some random fd showing up where some other code expected stdio to be. For instance, on macOS (xnu), [this happens for setuid binaries](https://github.com/apple-oss-distributions/xnu/blob/xnu-11417.121.6/bsd/kern/kern_exec.c#L6887-L6959). I have a memory that this happens on one of the BSDs unconditionally.

So, I think the tests should be changed to do one of the following:
* Continue to close fd 0, but if fstat succeeds but the info is different, consider that a test success.
* Pick a high-numbered fd, which you can do with `dup2()`, and use that to be _relatively_ confident that the surrounding environment will not open so many files as to reach that fd.

### BOLT runtime bug

`-hugify` means to use 2MB hugepages. If you're using `-hugify`, BOLT injects a runtime helper and a call at startup which looks to see if the system supports hugepages by [reading a file in sysfs](https://github.com/llvm/llvm-project/blob/llvmorg-20.1.6/bolt/runtime/hugify.cpp#L61-L92). Unfortunately, at no point does it close this file.

Unlike the cases above (NSS modules, emulators, etc.), there's no reason that the BOLT runtime needs this file to stick around. In fact, the fd is assigned to a local variable and therefore leaked. There just needs to be a `close()` call, and then I think the test suite failures would go away (for platforms/environments where they are passing without `-hugify`).

(Also, I wonder why it's doing a feature check instead of just trying to madvise and ignoring any failure.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix and reenable bolt -hugify flag #641

CPython test suite bug

BOLT runtime bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Fix and reenable bolt -hugify flag #641

Description

CPython test suite bug

BOLT runtime bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions