[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177

alexm-neuralmagic · 2024-12-13T16:32:56Z

benchmark_serving.py: This PR adds the option to repeat images with specific probability so we can test HIT/MISS of MM cache.

github-actions · 2024-12-13T16:33:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>

…ache Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>

mergify · 2024-12-13T16:33:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alexm-neuralmagic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

alexm-neuralmagic · 2024-12-13T16:33:33Z

/ready

DarkLight1337 · 2024-12-13T16:44:31Z

vllm/v1/engine/mm_input_mapper.py

        self.mm_cache_hits = 0
        self.mm_cache_total = 0

    def cache_hit_ratio(self, steps) -> float:
        if self.mm_cache_total > 0 and self.mm_cache_total % steps == 0:
-            logger.debug("MMInputMapper: cache_hit_ratio = %.2f ",
-                         self.mm_cache_hits / self.mm_cache_total)
+            print("MMInputMapper: cache_hit_ratio = %.2f ",


Is this intended?

AFAIK no. This should be reverted before merging the PR.

comaniac · 2024-12-13T17:13:34Z

vllm/v1/engine/mm_input_mapper.py

        self.mm_cache_hits = 0
        self.mm_cache_total = 0

    def cache_hit_ratio(self, steps) -> float:
        if self.mm_cache_total > 0 and self.mm_cache_total % steps == 0:
-            logger.debug("MMInputMapper: cache_hit_ratio = %.2f ",
-                         self.mm_cache_hits / self.mm_cache_total)
+            print("MMInputMapper: cache_hit_ratio = %.2f ",


AFAIK no. This should be reverted before merging the PR.

comaniac · 2024-12-13T17:30:03Z

benchmarks/benchmark_serving.py

+        if image_repeater is not None:
+            image = image_repeater.process(image)


Looks to me that we don't have to make it complicate. For example we're benchmarking 1k prompts. The original behavior is "taking the first 1k data from dataset" and all images in data are different. If we just want to simulate the image caching, we could simply do the following

# Get the first N unique images. n_requests = len(sampled_requests) n_unique_images = int(n_requests * (1 - hit_rate)) unique_images = [data["image"] for data in dataset[:n_unique_images]] # Repeat the unique images to meet the number of requests. all_images = images * ceil(n_requests / n_unique_images)[:n_requests] for image in all_images: ...

alexm-neuralmagic self-assigned this Dec 13, 2024

alexm-neuralmagic requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96 and comaniac as code owners December 13, 2024 16:32

alexm-neuralmagic added 2 commits December 13, 2024 16:33

[benchmark] Add tokenizer_mode param to benchmark_serving.py

0fb851b

Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>

Add image repeat to the benchmark_serving.py to test hit/miss of MM c…

d35febb

…ache Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>

alexm-neuralmagic force-pushed the mm_bench_cache branch from 9b4df28 to d35febb Compare December 13, 2024 16:33

mergify bot added the needs-rebase label Dec 13, 2024

alexm-neuralmagic requested a review from DarkLight1337 December 13, 2024 16:34

DarkLight1337 reviewed Dec 13, 2024

View reviewed changes

comaniac reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177

[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177

alexm-neuralmagic commented Dec 13, 2024

github-actions bot commented Dec 13, 2024

mergify bot commented Dec 13, 2024

alexm-neuralmagic commented Dec 13, 2024

DarkLight1337 Dec 13, 2024

comaniac Dec 13, 2024

comaniac Dec 13, 2024

comaniac Dec 13, 2024

		if image_repeater is not None:
		image = image_repeater.process(image)

[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177

Are you sure you want to change the base?

[Misc] Add image repeat option to benchmark_serving.py (to test hit/miss of MM cache) #11177

Conversation

alexm-neuralmagic commented Dec 13, 2024

github-actions bot commented Dec 13, 2024

mergify bot commented Dec 13, 2024

alexm-neuralmagic commented Dec 13, 2024

DarkLight1337 Dec 13, 2024

Choose a reason for hiding this comment

comaniac Dec 13, 2024

Choose a reason for hiding this comment

comaniac Dec 13, 2024

Choose a reason for hiding this comment

comaniac Dec 13, 2024

Choose a reason for hiding this comment