-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
…ache Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
9b4df28
to
d35febb
Compare
This pull request has merge conflicts that must be resolved before it can be |
/ready |
self.mm_cache_hits = 0 | ||
self.mm_cache_total = 0 | ||
|
||
def cache_hit_ratio(self, steps) -> float: | ||
if self.mm_cache_total > 0 and self.mm_cache_total % steps == 0: | ||
logger.debug("MMInputMapper: cache_hit_ratio = %.2f ", | ||
self.mm_cache_hits / self.mm_cache_total) | ||
print("MMInputMapper: cache_hit_ratio = %.2f ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK no. This should be reverted before merging the PR.
self.mm_cache_hits = 0 | ||
self.mm_cache_total = 0 | ||
|
||
def cache_hit_ratio(self, steps) -> float: | ||
if self.mm_cache_total > 0 and self.mm_cache_total % steps == 0: | ||
logger.debug("MMInputMapper: cache_hit_ratio = %.2f ", | ||
self.mm_cache_hits / self.mm_cache_total) | ||
print("MMInputMapper: cache_hit_ratio = %.2f ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK no. This should be reverted before merging the PR.
if image_repeater is not None: | ||
image = image_repeater.process(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me that we don't have to make it complicate. For example we're benchmarking 1k prompts. The original behavior is "taking the first 1k data
from dataset" and all images in data
are different. If we just want to simulate the image caching, we could simply do the following
# Get the first N unique images.
n_requests = len(sampled_requests)
n_unique_images = int(n_requests * (1 - hit_rate))
unique_images = [data["image"] for data in dataset[:n_unique_images]]
# Repeat the unique images to meet the number of requests.
all_images = images * ceil(n_requests / n_unique_images)[:n_requests]
for image in all_images:
...
benchmark_serving.py: This PR adds the option to repeat images with specific probability so we can test HIT/MISS of MM cache.