You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link
2024-08-11T22:11:55.740778Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-08-11T22:11:55.740792Z INFO text_generation_launcher: Sharding model on 5 processes
2024-08-11T22:11:55.740878Z INFO download: text_generation_launcher: Starting download process.
2024-08-11T22:11:58.162183Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-08-11T22:11:58.543915Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-08-11T22:11:58.544105Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-08-11T22:12:03.786121Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-08-11T22:12:03.786323Z INFO text_generation_launcher: CLI SHARDED = 5
2024-08-11T22:12:08.551807Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:18.560690Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:28.568664Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:38.578065Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:43.403761Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server exited with status = 1
2024-08-11T22:12:44.082499Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[WARNING|utils.py:212] 2024-08-11 22:12:02,456 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:03,010 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
[WARNING|utils.py:212] 2024-08-11 22:12:19,432 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,435 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,440 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,124 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,588 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,747 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:20.920 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:20.921 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
2024-08-11 22:12:21.398 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.399 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
2024-08-11 22:12:21.561 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.561 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
[WARNING|utils.py:225] 2024-08-11 22:12:21,580 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
[WARNING|utils.py:225] 2024-08-11 22:12:21,759 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:22.487 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.487 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
2024-08-11 22:12:22.611 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.612 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 0
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM : 1056375276 KB
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
Loading.load(child, self.state_dict, checking_key, self.mp_group)
KeyError File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
: 'model.layers.3.self_attn.q_proj.weight'
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s] rank=0
2024-08-11T22:12:44.083532Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-11T22:12:44.083544Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
System Info
root@laion-gaudi2-00:/home/sdp# docker run -p 8081:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$hf_token -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=2,3,4,5,6,7 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 5 --max-input-tokens 4096 --max-total-tokens 8192 --max-batch-prefill-tokens 8242
2024-08-11T22:11:55.546868Z INFO text_generation_launcher: Args {
model_id: "hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4",
revision: None,
validation_workers: 2,
sharded: Some(
true,
),
num_shard: Some(
5,
),
quantize: None,
speculate: None,
dtype: None,
trust_remote_code: false,
max_concurrent_requests: 128,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: Some(
4096,
),
max_input_length: None,
max_total_tokens: Some(
8192,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: Some(
8242,
),
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: None,
cuda_graphs: None,
hostname: "0d708a2172ae",
port: 80,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: Some(
"/data",
),
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
cors_allow_origin: [],
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: false,
max_client_batch_size: 4,
}
2024-08-11T22:11:55.546942Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link
/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link
2024-08-11T22:11:55.740778Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-08-11T22:11:55.740792Z INFO text_generation_launcher: Sharding model on 5 processes
2024-08-11T22:11:55.740878Z INFO download: text_generation_launcher: Starting download process.
2024-08-11T22:11:58.162183Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-08-11T22:11:58.543915Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-08-11T22:11:58.544105Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-08-11T22:12:03.786121Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16
2024-08-11T22:12:03.786323Z INFO text_generation_launcher: CLI SHARDED = 5
2024-08-11T22:12:03.786401Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server
2024-08-11T22:12:08.551807Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:18.560690Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:28.568664Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:38.578065Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-08-11T22:12:43.403761Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 5 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 --revision None --sharded True --dtype bfloat16 --trust_remote_code False --uds_path /tmp/text-generation-server exited with status = 1
2024-08-11T22:12:44.082499Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
[WARNING|utils.py:212] 2024-08-11 22:12:02,456 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:03,010 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
[WARNING|utils.py:212] 2024-08-11 22:12:19,432 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,435 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,437 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:212] 2024-08-11 22:12:19,440 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but habana-frameworks v1.16.0.526 was found, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,124 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,588 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
[WARNING|utils.py:225] 2024-08-11 22:12:20,747 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:20.920 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:20.921 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored./usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
2024-08-11 22:12:21.398 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.399 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.2024-08-11 22:12:21.561 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:21.561 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
[WARNING|utils.py:225] 2024-08-11 22:12:21,580 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.[WARNING|utils.py:225] 2024-08-11 22:12:21,759 >> optimum-habana v1.13.0.dev0 has been validated for SynapseAI v1.17.0 but the driver version is v1.16.2, this could lead to undefined behavior!
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
2024-08-11 22:12:22.487 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.487 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.2024-08-11 22:12:22.611 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-08-11 22:12:22.612 | INFO | main:main:11 - TGIService: --model_id hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4, --revision None, --sharded True, --speculate None, --dtype bfloat16, --trust_remote_code True, --uds_path /tmp/text-generation-server
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored./usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 0
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM : 1056375276 KB
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
Loading.load(child, self.state_dict, checking_key, self.mp_group)
KeyError File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
: 'model.layers.3.self_attn.q_proj.weight'
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 37, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 101, in get_model
return CausalLM(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 629, in init
model = self.get_deepspeed_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 783, in get_deepspeed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 340, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 165, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 412, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 345, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 603, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 663, in _replace_module
_, layer_id = _replace_module(child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 639, in _replace_module
replaced_module = policies[child.class][0](child,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 316, in replace_fn
new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 299, in replace_wo_policy
return _autotp._replace_module(module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 464, in _replace_module
self._replace_module(child, name, class_name)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 443, in _replace_module
Loading.load(child, self.state_dict, checking_key, self.mp_group)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/auto_tp.py", line 161, in load
module.weight = mp_replace.copy(module.weight.data, state_dict[prefix + 'weight'])
KeyError: 'model.layers.3.self_attn.q_proj.weight'
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading 9 checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s] rank=0
2024-08-11T22:12:44.083532Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-11T22:12:44.083544Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
Information
Tasks
Reproduction
docker run -p 8081:80 -v $volume:/data --runtime=habana -e HUGGING_FACE_HUB_TOKEN=$hf_token -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=2,3,4,5,6,7 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 5 --max-input-tokens 4096 --max-total-tokens 8192 --max-batch-prefill-tokens 8242
Expected behavior
it should run.
The text was updated successfully, but these errors were encountered: