Description
The Intel MacOS builds are taking an increasingly long time to complete, due to an expedient hack that I settled on to work around a particular MacStadium idiosyncracy.
Background:
- The builds are dependent on ccache, without which they will take many hours to run.
- The OSX MacStadium orka VM's are ephemeral, provisioned from a base image and deleted when not in use.
- There is a shared filesystem that is attached to the VM's where we can persist the ccache changes from build to build.
- The shared filesystem uses a different mounting technology on Intel VM's vs Apple Silicon VMs. (Plan9 on Intel vs VirtIOfs on Apple)
- The behaviour of the shared filesystem on Intel causes all files written to it to be owned by UID 107/GID 107. The process that runs the builds is UID 501, and is a member of the GID 107.
- The write processes typically respects umask, and doing things like creating files with an open umask of 002 allows the files to be created such that they are readable by group members.
- ccache compiles to a temporary file which it creates using
mkstemp
which requires that temp files are created with 600 permissions. When compilation is complete it moves this file to the actual cache on the shared filesystem. - When ccache moves the file to the shared cache, the file inherits the 107:107 UID/GID, and now the file has 600 permissions and the UID that moved it (501) cannot change the permissions to 664.
- Future builds are unable to access the cached files, resulting in a cache miss, making the cache un-useful.
- The expedient hack was a recursive chmod that runs as root to open up any files created during the build so the next build can use those cached files. This added about 10 minutes to a 30 minute build, but worked well enough.
Problem:
The existing shared cache is growing. Newer versions of v8 and other large quantities of compiled files have made the cache considerably larger. This has caused the recursive chmod to take more than 25 minutes (after about 6 weeks of this pattern). This problem is compounded by the fact that if more than one build is running, more than one recursive chmod may hit the same shared filesystem at the same time, causing filesystem contention issues (some builds have been up to 45 minutes of chmod'ing).
Potential Solutions:
- Contact MacStadium Support in the hopes that there is some method or mechanism where they can have the filesystem mounted as a UID that matches the images that they provide (501 in this case), or that there is some other mechanism for solving this conundrum. -> I attempted this, they suggested a umask, and also hinted that they were highly unlikely to put very much effort into the Intel VM architecture as its the "on the way out the door" arch for OSX.
- Run
ccache
with sudo -> when ccache is run as sudo, it has the necessary permissions to chmod the cached file from 600 to 660. I attempted that here: https://ci.nodejs.org/job/raslett-node-test-commit-osx/nodes=osx13-x64/4/console. This strategy will likely not work as there are then other steps in the makefile that need to access directories that are created by ccache that it does not have permissions to access (e.g. thetouch
permission denied issues) - Run
make
with sudo -> Perhaps if the whole build runs with elevated permissions, there will be no permission issues: I attempted this here: https://ci.nodejs.org/job/raslett-node-test-commit-osx/nodes=osx13-x64/6/consoleFull
This does work, populates the cache properly, however we then run into a pair of tests that fail:
03:28:01 out/Release/node --test-reporter=./test/common/test-error-reporter.js --test-reporter-destination=stdout /Users/admin/build/workspace/raslett-node-test-commit-osx/nodes/osx13-x64/test/parallel/test-config-file.js
03:28:01 out/Release/node --test-reporter=./test/common/test-error-reporter.js --test-reporter-destination=stdout /Users/admin/build/workspace/raslett-node-test-commit-osx/nodes/osx13-x64/test/parallel/test-sqlite-backup.mjs
These tests fail because they are checking for a permission denied scenario, which is no longer denied by running make as root. So this strategy will not work.
- Change the UID of the account that runs the build to 107 to match the underlying filesystem - create an iojs user to run these builds that matches the UID. -> this is the next step I will pursue.