Open
Description
opened on Nov 1, 2024
Updating to transformers versions beyond v4.43.4 causes issues with the CI tests in the legacy mode. The bloom tests fail with:
FAILED test_non_persistent_deployment.py::test_single_GPU[None-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3-non-persistent] - ValueError: not enough values to unpack (expected 2, got 0)
FAILED test_local_deployment.py::test_session[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_multi_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_single_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_meta_tensor[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-False-1-True-False-ds_config0-2-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_load_to_sys_mem[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_restful_api[query0-28080-None-bigscience/bloom-560m-local-50050-text-generation-fp16-1-False-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_replicas[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-False-True-False-ds_config0-2] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
We have isolated the problematic commit to this one: huggingface/transformers#31445
../../mii/legacy/client.py:144: in query
return task_methods.run_inference(inference_pipeline, args, query_kwargs)
../../mii/legacy/method_table.py:101: in run_inference
response = inference_pipeline(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:262: in __call__
return super().__call__(text_inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1238: in __call__
outputs = list(final_iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:124: in __next__
item = next(self.iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:125: in __next__
processed = self.infer(item, **self.params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1164: in forward
model_outputs = self._forward(model_inputs, **forward_params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:351: in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/inference/engine.py:631: in _generate
return self.module.generate(*inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116: in decorate_context
return func(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2024: in generate
result = self._sample(
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2982: in _sample
outputs = self(**model_inputs, return_dict=True)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:955: in forward
transformer_outputs = self.transformer(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:744: in forward
outputs = block(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py:162: in forward
self.attention(input,
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py:168: in forward
context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
Metadata
Assignees
Labels
No labels
Activity