Skip to content

Conversation

andylin-hao
Copy link

@andylin-hao andylin-hao commented Oct 11, 2025

This PR fixes LIBERO's compatibility with torch>=2.6 by adding weights_only=False to torch.load, which is a breaking change in torch 2.6 that denies any other data structures besides tensor, primitives, and dictionary. See https://docs.pytorch.org/docs/stable/notes/serialization.html#weights-only

When running LIBERO on torch 2.6, the following error occurs:

File "/opt/libero/libero/libero/benchmark/__init__.py", line 164, in get_task_init_states
    init_states = torch.load(init_states_path)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/openvla/lib/python3.11/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
    (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
    (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
    WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_reconstruct])` or the `torch.serialization.safe_globals([_reconstruct])` context manager to allowlist this global if you trust this class/function.

After adding the weights_only argument, the states can be correctly loaded and LIBERO can run on torch >= 2.6. Supposedly, this addresses #99.

Further more, when running multiple processes using LIBERO for the first time, all the processes try to create the config directory and file simultaneously, causing contention.
For example, several processes may find that the config directory does not exist, and try to create it together. Only one process will succeed while the others crash directly due to folder already exists.

This PR adds file lock to prevent such contention, thereby enabling LIBERO to be used in large parallel training.

Signed-off-by: Hao Lin <linhaomails@gmail.com>
When multiple libero instances run for the first time, they all try to
create the config directory and file, causing contention.
This commit adds a file lock to prevent contention

Signed-off-by: Hao Lin <linhaomails@gmail.com>
@andylin-hao andylin-hao changed the title Fix: torch >= 2.6 compatibility Fix: torch >= 2.6 compatibility & multi-instance file contention Oct 11, 2025
Signed-off-by: Hao Lin <linhaomails@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant