Skip to content

SIGSEGV when instantiating PyTorch model #5933

@alexpeters1208

Description

@alexpeters1208

I can consistently segfault the server in Docker and Pip by running the following code. It requires the Python packages torch and transformers.

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

and then running the last line again:

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Error:

docker-deephaven-1  | #
docker-deephaven-1  | # A fatal error has been detected by the Java Runtime Environment:
docker-deephaven-1  | #
docker-deephaven-1  | #  SIGSEGV (0xb) at pc=0x0000ffff11b7255c, pid=1, tid=189
docker-deephaven-1  | #
docker-deephaven-1  | # JRE version: OpenJDK Runtime Environment Temurin-21.0.4+7 (21.0.4+7) (build 21.0.4+7-LTS)
docker-deephaven-1  | # Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (21.0.4+7-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
docker-deephaven-1  | # Problematic frame:
docker-deephaven-1  | # C  [libpython3.10.so+0x1b255c]
docker-deephaven-1  | #
docker-deephaven-1  | # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
docker-deephaven-1  | #
docker-deephaven-1  | # An error report file with more information is saved as:
docker-deephaven-1  | # //hs_err_pid1.log
docker-deephaven-1  | [45.206s][warning][os] Loading hsdis library failed
docker-deephaven-1  | #
docker-deephaven-1  | # If you would like to submit a bug report, please visit:
docker-deephaven-1  | #   https://github.com/adoptium/adoptium-support/issues
docker-deephaven-1  | # The crash happened outside the Java Virtual Machine in native code.
docker-deephaven-1  | # See problematic frame for where to report the bug.
docker-deephaven-1  | #
docker-deephaven-1  | 
docker-deephaven-1  | [error occurred during error reporting (), id 0x5, SIGTRAP (0x5) at pc=0x0000ffff950771ec]

This does not happen in a plain Python console. It also does not happen if I run the last line twice in the same script:

import time

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

time.sleep(5)

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

I'm on latest for Docker, which is 0.35.2. I observe the same behavior on edge. Also, maybe importantly, I'm on Apple Silicon.
@jjbrosnan tried to repro on X86. He could repro on latest, but not on edge. When he attempted on edge, his DH server hung at the same point where mine crashed.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreCore development taskspython

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions