Hotfix: fix of use of unquantized weights in Mixtral GQA loading #2269

icyxp · 2024-07-22T07:26:33Z

2024-07-22T07:12:15.783380Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 118, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 908, in get_model
    return FlashCausalLM(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 898, in __init__
    model = model_class(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 615, in __init__
    self.model = MixtralModel(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 549, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 550, in <listcomp>
    MixtralLayer(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 481, in __init__
    self.self_attn = MixtralAttention(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 227, in __init__
    self.query_key_value = load_attention(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 120, in load_attention
    return _load_gqa(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mixtral_modeling.py", line 149, in _load_gqa
    assert list(weight.shape) == [
AttributeError: 'UnquantizedWeight' object has no attribute 'shape'
2024-07-22T07:12:17.111690Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Fix syntax issues

danieldk

Awesome, thanks!

* Update idefics_causal_lm.py Fix syntax issues * fix dbrx & opt model prefix bug * Hotfix: fix of use of unquantized weights in Mixtral GQA loading

…gingface#2269) * Update idefics_causal_lm.py Fix syntax issues * fix dbrx & opt model prefix bug * Hotfix: fix of use of unquantized weights in Mixtral GQA loading

icyxp and others added 5 commits August 30, 2023 10:39

Update idefics_causal_lm.py

0c7559b

Fix syntax issues

Merge branch 'huggingface:main' into main

28cecb6

fix dbrx & opt model prefix bug

39944e1

Merge branch 'huggingface:main' into main

6111e9e

Hotfix: fix of use of unquantized weights in Mixtral GQA loading

dc70cf9

danieldk approved these changes Jul 22, 2024

View reviewed changes

danieldk merged commit 4e42072 into huggingface:main Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix: fix of use of unquantized weights in Mixtral GQA loading #2269

Hotfix: fix of use of unquantized weights in Mixtral GQA loading #2269

icyxp commented Jul 22, 2024

danieldk left a comment

Hotfix: fix of use of unquantized weights in Mixtral GQA loading #2269

Hotfix: fix of use of unquantized weights in Mixtral GQA loading #2269

Conversation

icyxp commented Jul 22, 2024

danieldk left a comment

Choose a reason for hiding this comment