Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Loudly reject compression when the tensor isn't sparse enough #55

Merged
merged 2 commits into from
Feb 24, 2024

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Feb 23, 2024

Tested by trying to enable 2:4 sparsity on a dense model:

from vllm import LLM

model = LLM(
    "facebook/opt-125m",
    sparsity="semi_structured_sparse_w16a16",
)

Output:

python compress-vllm.py 
INFO 02-23 23:10:00 llm_engine.py:79] Initializing an LLM engine with config: model='facebook/opt-125m', tokenizer='facebook/opt-125m', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=512, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, sparsity=semi_structured_sparse_w16a16, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 02-23 23:10:04 weight_utils.py:176] Using model weights format ['*.bin']
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([2304, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([3072, 768]) but does not have 2:4 sparsity, skipping compression
WARNING 02-23 23:10:04 lazy_compressed.py:114] Called compress() on tensor of shape torch.Size([768, 3072]) but does not have 2:4 sparsity, skipping compression
INFO 02-23 23:10:05 llm_engine.py:338] # GPU blocks: 76243, # CPU blocks: 7281
INFO 02-23 23:10:07 model_runner.py:676] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 02-23 23:10:07 model_runner.py:680] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 02-23 23:10:09 model_runner.py:748] Graph capturing finished in 2 secs.

@mgoin mgoin merged commit c55248d into main Feb 24, 2024
2 checks passed
@mgoin mgoin deleted the reject-compression branch February 24, 2024 00:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants