Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: include create_exllama_buffers and set_device for exllama #2407

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Aug 12, 2024

This PR reimports set_device and set_device which are later used in Warmup

create_exllama_buffers(request.max_prefill_tokens)
and the buffer is used here
output = ext_gemm_half_q_half(x, self.q_handle, self.outfeatures, force_cuda)

This should resolve the ex2 test cases that are receiving a None instead of the pointer to the buffer

@drbh
Copy link
Collaborator Author

drbh commented Aug 12, 2024

merging as this fixes imports that were accidentally removed in #2262 and only adds the missing import lines

@drbh drbh merged commit 8a7749b into main Aug 12, 2024
11 checks passed
@drbh drbh deleted the fix-exllama-buffer-imports branch August 12, 2024 21:59
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant