Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Marlin downstream clean #26

Closed
wants to merge 50 commits into from
Closed

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Feb 18, 2024

Cleaned up version of Marlin PR that can be merged into main.

Added some E2E tests, which compare the results of the exllama kernels to the marlin kernels. See tests/models/test_marlin.py for more details

image

afeldman-nm and others added 30 commits February 18, 2024 20:12
…anch safe_expose_semi_structured_sparse_tensor
…size by running multiple parallel problems of size 64. (2) Refactor the workspace to be dynamic per layer
cleanup to undo autoformatting
@robertgshaw2-redhat
Copy link
Collaborator Author

robertgshaw2-redhat commented Feb 18, 2024

To use:

from vllm import LLM, SamplingParams

model = LLM("robertgshaw2/TinyLlama-1.1B-Chat-v1.0-g128-marlin")

sampling_params = SamplingParams(max_tokens=100, temperature=0)
outputs = model.generate("Hello my name is", sampling_params=sampling_params)
outputs[0].outputs[0].text

This was referenced Feb 18, 2024
result in very slight nondeterminism for Marlin. As a result, we re-run the test
up to 3 times to see if we pass.

Run `pytest tests/models/test_marlin.py --forked`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "forked" argument doing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure, all the other models tests have this arg though.

I've just been running pytest tests/models/test_marlin.py

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to include a license file for marlin somewhere

Comment on lines +75 to +77
torch::Tensor& a,
torch::Tensor& b_q_weight,
torch::Tensor& b_scales,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be const&?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexm-nm ?

@robertgshaw2-redhat
Copy link
Collaborator Author

I think we'll need to include a license file for marlin somewhere

@tlrmchlsmth

We have the following in the cuda code:

/*
 * Copyright (C) Marlin.2024 Elias Frantar (elias.frantar@ist.ac.at)
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *         http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

@robertgshaw2-redhat
Copy link
Collaborator Author

Closing in favor of #43

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants