Skip to content

Updating transformers issue with bloom models #541

Open
@loadams

Description

Updating to transformers versions beyond v4.43.4 causes issues with the CI tests in the legacy mode. The bloom tests fail with:

FAILED test_non_persistent_deployment.py::test_single_GPU[None-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3-non-persistent] - ValueError: not enough values to unpack (expected 2, got 0)
FAILED test_local_deployment.py::test_session[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_multi_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_single_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_meta_tensor[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-False-1-True-False-ds_config0-2-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_load_to_sys_mem[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_restful_api[query0-28080-None-bigscience/bloom-560m-local-50050-text-generation-fp16-1-False-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_replicas[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-False-True-False-ds_config0-2] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:

We have isolated the problematic commit to this one: huggingface/transformers#31445

../../mii/legacy/client.py:144: in query
    return task_methods.run_inference(inference_pipeline, args, query_kwargs)
../../mii/legacy/method_table.py:101: in run_inference
    response = inference_pipeline(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:262: in __call__
    return super().__call__(text_inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1238: in __call__
    outputs = list(final_iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:124: in __next__
    item = next(self.iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:125: in __next__
    processed = self.infer(item, **self.params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1164: in forward
    model_outputs = self._forward(model_inputs, **forward_params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:351: in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/inference/engine.py:631: in _generate
    return self.module.generate(*inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2024: in generate
    result = self._sample(
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2982: in _sample
    outputs = self(**model_inputs, return_dict=True)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:955: in forward
    transformer_outputs = self.transformer(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:744: in forward
    outputs = block(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py:162: in forward
    self.attention(input,
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py:168: in forward
    context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions