Description
Some developers working in the Elasticsearch repository have reported intermittent problems with the machine learning controller process crashing when running a locally built Elasticsearch. The usual symptoms are Elasticsearch will fail to start and the log will contain this message:
[ERROR][o.e.b.Elasticsearch ] [runTask-0] fatal exception while booting Elasticsearch org.elasticsearch.ElasticsearchException: Failure running machine learning native code. This could be due to running on an unsupported OS or distribution, missing OS libraries, or a problem with the temp directory. To bypass this problem by running Elasticsearch without machine learning functionality set [xpack.ml.enabled: false].
A crash report can be found in the macOS Console app.
Path: /Users/USER/*/controller.app/Contents/MacOS/controller
Identifier: co.elastic.ml-cpp.controller
Version: 8.7.0
Code Type: ARM-64 (Native)
Exception Type: EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
Exception Subtype: UNKNOWN_0x32 at 0x000000010249c000
Exception Codes: 0x0000000000000032, 0x000000010249c000
VM Region Info: 0x10249c000 is in 0x10249c000-0x1024b0000; bytes after start: 0 bytes before end: 81919
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
---> mapped file 10249c000-1024b0000 [ 80K] r-x/r-x SM=COW ...t_id=787f689f
mapped file 1024b0000-1024b4000 [ 16K] rw-/rw- SM=COW ...t_id=787f689f
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: CODESIGNING 2
The error has been observed on Apple silicon only (so far).
Reproducing
It has not been possible to reproduce reliably but once the problem occurs a crash report can be generated by running controller --help
in the Elasticsearch repository.
<ES_REPO>/distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller --help
Running the app from a different location works ?!
Copy the app to a folder in the home directory and running the copy does not result in a crash:
cd ~/Desktop
cp -r <ES_REPO>/distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app .
controller.app/Contents/MacOS/controller --help
Possible Causes
macOS Quarantine
No.
The downloaded controller.app
does not have the the quarantine attribute set.
find . -xattrname com.apple.quarantine
returns nothing.
Security Policy
No.
After disabling security with sudo spctl --global-disable
the controller app still crashes.
cd <ES_REPO>
sudo spctl --global-disable
sudo spctl --asses -vv ./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller
./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller: accepted
override=security disabled
echo $?
0
When security is enabled the spctl --assess
function returns the same message as codesign --verify
cd <ES_REPO>
sudo spctl --asses -vv ./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller
./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller: code has no resources but signature indicates they must be present
echo $?
1
Code Signing
Maybe.
The crash report indicates code signing is involved
Exception Type: EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
and
Termination Reason: CODESIGNING 2
Verifying the signing returns an error message
cd <ES_REPO>
codesign -d --verify --verbose=4 ./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app
./distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app/Contents/MacOS/controller: code has no resources but signature indicates they must be present
It is not clear if that is a terminal error however.
Workarounds
In the commands below replace elasticsearch-8.7.0-SNAPSHOT
with your version.
- Deleting the bundled app from the local build and rebuilding is the most reliable fix:
cd <ES_REPO>
rm -rf distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app
./gradlew run
- Resigning the app with an ad-hoc signature works for some
codesign --force --deep --sign - <ES_REPO>/distribution/archives/darwin-aarch64-tar/build/install/elasticsearch-8.7.0-SNAPSHOT/modules/x-pack-ml/platform/darwin-aarch64/controller.app
- If all else fails Restart the Machine