Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix lora failure in platform which does not contain punica_kernels #2784

Closed
wants to merge 2 commits into from

Conversation

sywangyi
Copy link
Contributor

error like
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/cli.py", line 117, in serve
server.serve(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 315, in serve
asyncio.run(
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
self._run_once()
File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
handle._run()
File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.11/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(

File "/opt/conda/lib/python3.11/site-packages/text_generation_server/interceptor.py", line 24, in intercept
return await response
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/server.py", line 183, in Prefill
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py", line 1902, in generate_token
adapter_data = AdapterBatchData.from_meta(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/weights.py", line 117, in from_meta
data[k] = v.get_data(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/weights.py", line 89, in get_data
batched_weights = batch_type.load(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/adapters/lora.py", line 426, in load
tmp_shrink, tmp_expand = get_tmp_tensors(
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/sgmv.py", line 163, in get_tmp_tensors
tmp = get_tmp_tensor_for_size(nsegments, device)
File "/opt/conda/lib/python3.11/site-packages/text_generation_server/utils/sgmv.py", line 139, in get_tmp_tensor_for_size
tmp_size = _kernels.sgmv_cutlass_tmp_size(size)
AttributeError: 'NoneType' object has no attribute 'sgmv_cutlass_tmp_size'

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Contributor Author

@danieldk @Narsil please help review it

@sywangyi
Copy link
Contributor Author

sywangyi commented Dec 2, 2024

changes ok for me

@drbh
Copy link
Collaborator

drbh commented Dec 3, 2024

Hi @sywangyi thank you for the contribution, can you help describe when this error occurs? (model/adapters used)

In the proposed changes we lose the rank_data when the sgmv kernels are not present which will likely lead to issues in some generations.

Thoughts:
I believe the only call that makes use of the kernels in this function are get_tmp_tensors which shouldn't be called if use_sgmv is False. It may be possible that there is a case where if prefill or max_rank > BGMV_MAX_RANK: returns True - which will set use_sgmv True and subsequently call get_tmp_tensors and error if the kernels are not present. We may need to check has_sgmv before setting use_sgmv = True. However this may not be an issue/causing the error above.

**update: linking a small PR that avoids the bug mentioned #2796.

@sywangyi
Copy link
Contributor Author

sywangyi commented Dec 3, 2024

#2796 is also ok for me, I just follow https://github.com/huggingface/text-generation-inference/blob/main/docs/source/conceptual/lora.md in intel platform which does not contain punica and find the crash @drbh

@drbh
Copy link
Collaborator

drbh commented Dec 4, 2024

thank you @sywangyi for the contribution and information. Closing this PR as #2796 was merged.

@drbh drbh closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants