Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Fix sparsity arg in Engine/ModelArgs #179

Merged
merged 2 commits into from
Apr 11, 2024
Merged

Fix sparsity arg in Engine/ModelArgs #179

merged 2 commits into from
Apr 11, 2024

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Apr 10, 2024

The sparsity argument was being ignored because it was switched around with the new quantization_param_path arg.

>>> from vllm import LLM
>>> model = LLM("nm-testing/OpenHermes-2.5-Mistral-7B-pruned50", sparsity="sparse_w16a16", max_model_len=1024)
INFO 04-10 18:02:12 llm_engine.py:81] Initializing an LLM engine (v0.2.0) with config: model='nm-testing/OpenHermes-2.5-Mistral-7B-pruned50', speculative_config=None, tokenizer='nm-testing/OpenHermes-2.5-Mistral-7B-pruned50', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=True, quantization=None, sparsity=None, enforce_eager=8192, kv_cache_dtype=auto, quantization_param_path=False, device_config=cuda, seed=0)

Notice the sparsity=None in the log, which doesn't match the arg we specified.

Because the model config construction in create_engine_config() doesn't use named arguments, these args got flipped around, see the mass of:

    def create_engine_config(self, ) -> EngineConfig:
        device_config = DeviceConfig(self.device)
        model_config = ModelConfig(
            self.model, self.tokenizer, self.tokenizer_mode,
            self.trust_remote_code, self.download_dir, self.load_format,
            self.dtype, self.seed, self.revision, self.code_revision,
            self.tokenizer_revision, self.max_model_len, self.quantization,
            self.quantization_param_path, self.sparsity, self.enforce_eager,
            self.max_context_len_to_capture, self.max_logprobs)
            ...

Comment on lines +435 to +453
self.model,
self.tokenizer,
self.tokenizer_mode,
self.trust_remote_code,
self.download_dir,
self.load_format,
self.dtype,
self.seed,
self.revision,
self.code_revision,
self.tokenizer_revision,
self.max_model_len,
self.quantization,
self.quantization_param_path,
# UPSTREAM SYNC: keep sparsity argument
self.sparsity,
self.enforce_eager,
self.max_context_len_to_capture,
self.max_logprobs)
Copy link
Member

@bnellnm bnellnm Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering how this broke, should we be using names for some/most of these arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do want to, but that would further diverge from the upstream so i didn't go for it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do an upstream PR for this (i.e. for bills idea of making these named args)

@mgoin mgoin merged commit dcd4973 into main Apr 11, 2024
2 checks passed
@mgoin mgoin deleted the fix-sparsity-engine-arg branch April 11, 2024 00:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants