fix lora failure in platform which does not contain punica_kernels #2784

sywangyi · 2024-11-28T08:25:37Z

error like
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(

File "/opt/conda/lib/python3.11/site-packages/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 183, in Prefill
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py", line 1902, in generate_token
adapter_data = AdapterBatchData.from_meta(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/weights.py", line 117, in from_meta
data[k] = v.get_data(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/weights.py", line 89, in get_data
batched_weights = batch_type.load(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/lora.py", line 426, in load
tmp_shrink, tmp_expand = get_tmp_tensors(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/sgmv.py", line 163, in get_tmp_tensors
tmp = get_tmp_tensor_for_size(nsegments, device)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/sgmv.py", line 139, in get_tmp_tensor_for_size
tmp_size = _kernels.sgmv_cutlass_tmp_size(size)
AttributeError: 'NoneType' object has no attribute 'sgmv_cutlass_tmp_size'

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-11-28T08:26:51Z

@danieldk @Narsil please help review it

server/text_generation_server/adapters/lora.py

sywangyi · 2024-12-02T08:45:52Z

changes ok for me

drbh · 2024-12-03T00:28:08Z

Hi @sywangyi thank you for the contribution, can you help describe when this error occurs? (model/adapters used)

In the proposed changes we lose the rank_data when the sgmv kernels are not present which will likely lead to issues in some generations.

Thoughts:
I believe the only call that makes use of the kernels in this function are get_tmp_tensors which shouldn't be called if use_sgmv is False. It may be possible that there is a case where if prefill or max_rank > BGMV_MAX_RANK: returns True - which will set use_sgmv True and subsequently call get_tmp_tensors and error if the kernels are not present. We may need to check has_sgmv before setting use_sgmv = True. However this may not be an issue/causing the error above.

**update: linking a small PR that avoids the bug mentioned #2796.

sywangyi · 2024-12-03T03:20:02Z

#2796 is also ok for me, I just follow https://github.com/huggingface/text-generation-inference/blob/main/docs/source/conceptual/lora.md in intel platform which does not contain punica and find the crash @drbh

drbh · 2024-12-04T20:26:48Z

thank you @sywangyi for the contribution and information. Closing this PR as #2796 was merged.

fix lora failure in platform which does not contain punica_kernels

98d0093

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Narsil reviewed Dec 2, 2024

View reviewed changes

server/text_generation_server/adapters/lora.py Outdated Show resolved Hide resolved

Update server/text_generation_server/adapters/lora.py

600d7e6

drbh closed this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix lora failure in platform which does not contain punica_kernels #2784

fix lora failure in platform which does not contain punica_kernels #2784

sywangyi commented Nov 28, 2024

sywangyi commented Nov 28, 2024

sywangyi commented Dec 2, 2024

drbh commented Dec 3, 2024 •

edited

Loading

sywangyi commented Dec 3, 2024 •

edited

Loading

drbh commented Dec 4, 2024 •

edited

Loading

fix lora failure in platform which does not contain punica_kernels #2784

fix lora failure in platform which does not contain punica_kernels #2784

Conversation

sywangyi commented Nov 28, 2024

sywangyi commented Nov 28, 2024

sywangyi commented Dec 2, 2024

drbh commented Dec 3, 2024 • edited Loading

sywangyi commented Dec 3, 2024 • edited Loading

drbh commented Dec 4, 2024 • edited Loading

drbh commented Dec 3, 2024 •

edited

Loading

sywangyi commented Dec 3, 2024 •

edited

Loading

drbh commented Dec 4, 2024 •

edited

Loading