Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LoRA adapters in vLLM inference engine #562

Merged
merged 11 commits into from
Sep 29, 2024
Merged

Conversation

wizeng23
Copy link
Contributor

Fixes OPE-395

  • Adds repr functions to Message and Conversation
  • Properly forward model max length to vLLM engine

Copy link

linear bot commented Sep 27, 2024

@@ -106,6 +106,11 @@ def is_text(self) -> bool:
"""Checks if the message contains text."""
return self.type == Type.TEXT

def __repr__(self):
"""Returns a string representation of the message."""
content = self.content if self.is_text() else "<non-text-content>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a constant for "<non-text-content>" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with the type of the message instead.

def __repr__(self):
"""Returns a string representation of the message."""
content = self.content if self.is_text() else "<non-text-content>"
return f"{self.role.upper()}: {content}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also include type, id ? Any other small fields ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above modifies message to mention the type if it's not text. Added ID.

def __repr__(self):
"""Returns a string representation of the message."""
content = self.content if self.is_text() else "<non-text-content>"
return f"{self.role.upper()}: {content}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .upper() for role ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it more readable IMO.

ASSISTANT: How are you?
USER: Good

vs

assistant: How are you?
user: Good

def __repr__(self):
"""Returns a string representation of the message."""
content = self.content if self.is_text() else "<non-text-content>"
return f"{self.role.upper()}: {content}"
Copy link
Collaborator

@nikg4 nikg4 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of building the string manually, should we use some library to do formatting for us ?

For example: you create a temp dict with fields of interest , then use json , pprint.pformat, or somesuch to convert it to string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO The logic here is light enough that manually creating the string works fine.

@@ -53,6 +61,8 @@ def __init__(
quantization=quantization,
tensor_parallel_size=tensor_parallel_size,
enable_prefix_caching=enable_prefix_caching,
enable_lora=self.lora_request is not None,
max_model_len=model_params.model_max_length,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we sanitize this value before passing it to vllm ?
something like
max_model_len=(model_params.model_max_length if "... is not None and ... > 0" else None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO all validation should happen during config initialization, and downstream code like vLLM should be able to consume the configs as-is without additional validation. Added a validation check that model_max_length is a positive int if specified

Copy link
Collaborator

@taenin taenin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test in tests/inference/test_vllm_inference_engine.py covering this case

@wizeng23
Copy link
Contributor Author

Done, and tested on GCP. Also added a test for turn.py.

@wizeng23 wizeng23 merged commit e692b0d into main Sep 29, 2024
1 check passed
@wizeng23 wizeng23 deleted the wizeng/vllm branch September 29, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants