Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

Closes #3044

@WoosukKwon WoosukKwon requested a review from Yard1 February 27, 2024 07:00
@Yard1
Copy link
Collaborator

Yard1 commented Feb 27, 2024

Looks good, can we extend the test?

@WoosukKwon
Copy link
Collaborator Author

@Yard1 Added. PTAL!

@WoosukKwon
Copy link
Collaborator Author

@Yard1 The PR continues to fail in test_llama_lora_warmup while this PR seems to be irrelevant to it. Do you have any idea about this? I tried to debug but didn't succeed.

@Yard1
Copy link
Collaborator

Yard1 commented Feb 28, 2024

Hmm running the entire thing as a new process should help with cleanup. Or maybe try --forked?

@WoosukKwon
Copy link
Collaborator Author

@Yard1 Adding --forked indeed resolved the issue but increased the lora test time from 13 mins to 37 mins.

@WoosukKwon
Copy link
Collaborator Author

Let's merge the PR and optimize the testing time later. I thin this is ok since the lora test is not required for most PRs and the kernel test (which also uses --forked) takes a similar amount of time.

@WoosukKwon WoosukKwon enabled auto-merge (squash) February 28, 2024 21:03
@WoosukKwon WoosukKwon disabled auto-merge February 28, 2024 21:03
@WoosukKwon WoosukKwon merged commit 929b4f2 into main Feb 28, 2024
@WoosukKwon WoosukKwon deleted the gemma-lora branch February 28, 2024 21:03
@zhaotyer
Copy link
Contributor

Gemma "vocab_size" is 256000, Will the following restrictions have any impact? Should they be removed?
vllm/lora/layers.py

# Keep this in sync with csrc/punica/bgmv/bgmv_config.h
if 32000 < self.base_layer.vocab_size > 33024:
       raise ValueError(
                "When using LoRA, vocab size must be 32000 >= vocab_size <= 33024"
            )

@Yard1
Copy link
Collaborator

Yard1 commented Feb 29, 2024

We are not supporting lm_head deltas for gemma

@zhaotyer
Copy link
Contributor

zhaotyer commented Mar 4, 2024

We are not supporting lm_head deltas for gemma

Thank you for your reply @Yard1

            new_module = replace_submodule(
                            self.model, module_name,
                            from_layer(module, self.lora_slots, self.lora_config,
                                       self.model.config))
            # (yard1): TODO make this more robust
            if "lm_head" in module_name:
                sampler_module = self.model.get_submodule("sampler")
                new_module = replace_submodule(
                    self.model, "sampler",
                    from_layer_sampler(sampler_module, module, self.lora_slots,
                                       self.lora_config, self.model.config))

Vocal_size verification will only be performed when lm_head is also processed by lora, but please take a look at this question. #3000

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support LoRA for Gemma

4 participants