[Misc] IO Processor plugins for pooling models #22820

christian-pinto · 2025-08-13T14:41:29Z

This PR introduces the ability of pre/post processing pooling models input/output via an IO Processor plugin. This allows users to feed complex data structures to vLLM that are then parsed into prompts by the plugin, and generate any type of data out of the model output (e.g., images).

Main features:

A new endpoint for offline inference encode_with_io_processor added to LLM and AsyncLLM for offline inference, and a new endpoint /io_processor_pooling for online serving.
A new plugin group vllm.io_processor_plugins.
IO Processor Plugins are instantiated on a per instance basis via an EngineArgs argument io_processor_plugin or via a custom field (io_processor_plugin) in the model configuration (config.json).
Example plugin for the Prithvi model included in the testing pipeline as well as examples for both offline inference and online serving.
Documentation updated

This is an alternative implementation to #21621.

@DarkLight1337 @maxdebayser @mgazz

CLOSE #12249

github-actions · 2025-08-13T14:41:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces an alternative implementation for multimodal output generation using plugins, defining a new entry point LLMForImageTiling and a plugin mechanism for data pre- and post-processing. The changes are a good step forward, but I've identified a critical import error due to a typo in a module path that will break the code. Additionally, there are several high-severity issues concerning API consistency in type definitions and a potential for silent failures in the plugin loading mechanism, which could make debugging difficult. I've provided suggestions to address these points.

gemini-code-assist · 2025-08-13T14:42:37Z

vllm/__init__.py

There is a typo in the import path. It should be vllm.entrypoints.llm_for_image_tiling instead of vllm.entrypoint.llm_for_image_tiling (missing 's' in entrypoints). This will cause an ImportError at runtime.

Suggested change

from vllm.entrypoint.llm_for_image_tiling import LLMForImageTiling

from vllm.entrypoints.llm_for_image_tiling import LLMForImageTiling

gemini-code-assist · 2025-08-13T14:42:37Z

examples/offline_inference/prithvi_geospatial_mae_multimodal_processor.py

The variable name input shadows the built-in function input(). This is a discouraged practice as it can lead to unexpected behavior and bugs if the built-in function is needed later in the code. It's recommended to use a different name, for example, model_input, especially in an example file that users might copy.

Suggested change

input = {"type": "url", "data": image_url}

output = llm.predict(input)

model_input = {"type": "url", "data": image_url}

output = llm.predict(model_input)

gemini-code-assist · 2025-08-13T14:42:37Z

vllm/inputs/data.py

The format field in ImagePrompt is defined as a required field. However, the example usage in examples/offline_inference/prithvi_geospatial_mae_multimodal_processor.py does not provide it. This inconsistency can lead to runtime errors or confusion for users of the API. To align with the example and make the API more flexible, consider making the format field optional using NotRequired.

Suggested change

format: str

format: NotRequired[str]

gemini-code-assist · 2025-08-13T14:42:37Z

vllm/outputs.py

The type hint for the type parameter in ImageRequestOutput.__init__ is str, but it is assigned to self.type which is annotated as Literal["path", "object"]. This is a type inconsistency. The parameter's type hint should be Literal["path", "object"] to match the attribute's type for correctness and clarity.

Suggested change

def __init__(self, type: str, format: str, data: Any):

def __init__(self, type: Literal["path", "object"], format: str, data: Any):

gemini-code-assist · 2025-08-13T14:42:37Z

vllm/plugins/multimodal_data_processors/__init__.py

The broad except Exception: pass will silently ignore any errors that occur during plugin loading, including assertion errors or import errors within the plugin's loading function. This can make debugging issues with plugins very difficult. It's better to at least log the exception to provide visibility into why a plugin might have failed to load.

Suggested change

except Exception:

pass

except Exception:

logger.warning("Failed to load plugin %s.", name, exc_info=True)

maxdebayser · 2025-08-13T15:25:23Z

I think the plugin approach would make more sense in an online setting. Because if you're using an vLLM image or existing hosted vLLM server that is difficult to customize, then being able to load plugins makes a lot of sense. For the offline use case, since you're writing python code and are responsible for the deployment of the code anyway, it doesn't make a lot of difference since you can just call LLM encode and do the pre- and post-processing in the code that calls LLM.encode().
Another caveat is that the vLLM code changes quite quickly. Keeping up with upstream code changes is a constant challenge in the vllm-spyre plugin. Since you can't control which version of vLLM users are running it's possible that you might have to add code to handle with a range of vLLM versions.

christian-pinto · 2025-08-13T15:36:34Z

I think the plugin approach would make more sense in an online setting. Because if you're using an vLLM image or existing hosted vLLM server that is difficult to customize, then being able to load plugins makes a lot of sense. For the offline use case, since you're writing python code and are responsible for the deployment of the code anyway, it doesn't make a lot of difference since you can just call LLM encode and do the pre- and post-processing in the code that calls LLM.encode(). Another caveat is that the vLLM code changes quite quickly. Keeping up with upstream code changes is a constant challenge in the vllm-spyre plugin. Since you can't control which version of vLLM users are running it's possible that you might have to add code to handle with a range of vLLM versions.

Hey @maxdebayser ,

thanks a lot for your comment. Totally agree that the offline case is not as useful as the online one, if you're writing the code why not implement everything. I have used it only to establish this plugin mechanism. I have the online version almost ready and I will push it soon.

Regarding the plugins, yes, they might be tricky to support. However I believe that the entrypoint part of vLLM is pretty stable so far, at least not as fluid as the core. I like the idea of having this part as plugins because you can come up with new processing methods that can be applied to models without having to hack vLLM. Perhaps even control it dinamically with an env var that would be quite nice in an online setting.

examples/offline_inference/prithvi_geospatial_mae_multimodal_processor.py

DarkLight1337 · 2025-08-14T15:43:23Z

vllm/entrypoints/llm_for_image_tiling.py

I would move this to the example script as well, since this can't be used in online inference and only really makes sense in the context of the example script.

To avoid adding bloat to our library, I much prefer these types of wrappers to be the user's responsibility

Yes, I see your point. I can move this to the examples.

DarkLight1337 · 2025-08-14T15:45:09Z

vllm/inputs/data.py

DarkLight1337 · 2025-08-14T15:47:11Z

vllm/plugins/multimodal_data_processors/__init__.py

Based on the discussion from #14526, I think we can apply the processor instead encode method specifically, while leaving the other methods unchanged. This would allow it to work in online inference as well.

DarkLight1337 · 2025-08-15T11:41:50Z

vllm/entrypoints/openai/api_server.py

This is basically why I suggested to apply the plugin inside encode. That way you can use encode endpoint with any plugin instead of having to create a new endpoint.

TBH I have structured it like this after discussing with you in the other PR (#21621 (comment)). You were suggesting avoiding touching the async engine and instead put everything in the entrypoint.

I started liking the entrypoint idea because as models show up we can then start offering images endpoints instead of relying on the basic pooling one.

Also, if I do this in the encode, I will have to initialize the plugin in the AsyncLLM init and therefore add an extra argument for starting an engine with/without such data pre/post processor loaded.

Just making sure we are fully aligned before I try implementing it inside the encode.

Oh, I was mainly concerned about the aggregation part, where multiple inputs correspond to one output. After looking at #14526, I feel that we still need a plugin system to postprocess the outputs (in a one-to-one manner). Sorry for the confusion!

I see. In my case the plugin is what splits the input into multiple prompts

pre-process the input with the plugin -> generate 1+ prompts -> inference -> postprocess 1+ outputs with plugin -> 1 final output

Could we make one plugin system to serve both purposes? In the end a plugin could either aggregate multiple inference results as well as simply do a 1-to-1. Especially is we overhaul the PoolerOutput data structure we should be able to cater for multiple use-cases.

Since the existing methods assume a 1-to-1 correspondance between input and output, I think it would be better to define a new method (both in LLM class and online inference) for this flexibility. Maybe call it LLM.plugin_forward

Are you suggesting adding something like plugin_encode or similar to (Async-)LLM? That can work for me. Would you still prefer the entrypoint to remain the same for the online mode or would you be in favor of having an endpoint dedicated to images as in this version?

A separate endpoint should also be used for online inference. Ideally, we can pass in a parameter to determine which plugin to use for output processing (in case more than one plugin is loaded)

mergify · 2025-08-19T15:04:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @christian-pinto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

christian-pinto · 2025-08-19T15:13:27Z

@DarkLight1337 new version here

I have added a method in LLM and AsyncLLM and added an entrypoint to the server. For now I like the idea that the entrypoint is specific to images as people might want/need to support others in the future. Also, this allows to keep the data exchanged with the plugin a bit more structured.

christian-pinto · 2025-08-20T08:26:02Z

Hey @DarkLight1337 any comments on this new version? I have rebased to the current master. Thanks.

DarkLight1337

Sorry for the delay, was caught up with other things. Left some comments.

tests/plugins_tests/test_multimodal_processor_plugins.py

vllm/entrypoints/llm.py

vllm/v1/engine/async_llm.py

vllm/entrypoints/openai/serving_pooling_with_io_plugin.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

tests/plugins/prithvi_io_processor_plugin/prithvi_io_processor/prithvi_processor.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

…oint Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser · 2025-08-30T21:09:03Z

The failing blackwell test is not related.

…oint Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>

christian-pinto · 2025-09-01T07:28:48Z

Many thanks @maxdebayser for your contributions. The whole code is indeed more streamlined now.
Thanks @DarkLight1337 for your patience in dealing with this PR.
I have spotted a few issues with the links in the docs. I will open a new PR to fix them.

On to the next one now!

Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>

christian-pinto requested a review from aarnphm as a code owner August 13, 2025 14:41

mergify bot added documentation Improvements or additions to documentation frontend labels Aug 13, 2025

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

christian-pinto mentioned this pull request Aug 13, 2025

[Core] Hidden State Processors via plugins #21621

Closed

christian-pinto changed the title ~~[Miscc] multimodal output generation with plugins~~ [Misc] multimodal output generation with plugins Aug 13, 2025

DarkLight1337 reviewed Aug 14, 2025

View reviewed changes

examples/offline_inference/prithvi_geospatial_mae_multimodal_processor.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 14, 2025

View reviewed changes

vllm/inputs/data.py Outdated

Copy link

Member

DarkLight1337 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

christian-pinto reacted with thumbs up emoji

DarkLight1337 reviewed Aug 14, 2025

View reviewed changes

DarkLight1337 reviewed Aug 15, 2025

View reviewed changes

christian-pinto requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 19, 2025 15:03

mergify bot added the v1 label Aug 19, 2025

mergify bot added the needs-rebase label Aug 19, 2025

christian-pinto force-pushed the multimodal_processing_plugin_entrypoint branch from c608cfd to 7dc699f Compare August 19, 2025 15:21

mergify bot removed the needs-rebase label Aug 19, 2025

christian-pinto force-pushed the multimodal_processing_plugin_entrypoint branch from 1e57eb3 to a5fcab1 Compare August 20, 2025 08:24

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

tests/plugins_tests/test_multimodal_processor_plugins.py Outdated Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

vllm/entrypoints/openai/serving_pooling_with_io_plugin.py Outdated Show resolved Hide resolved

address review comments

9fcb7eb

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

DarkLight1337 reviewed Aug 30, 2025

View reviewed changes

tests/plugins/prithvi_io_processor_plugin/prithvi_io_processor/prithvi_processor.py Outdated Show resolved Hide resolved

address review comments

4475521

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

DarkLight1337 approved these changes Aug 30, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 30, 2025 15:29

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 30, 2025

maxdebayser added 2 commits August 30, 2025 15:01

use vllm_runner to clean up memory

1b93b04

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'upstream_main' into multimodal_processing_plugin_entryp…

14c01a7

…oint Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

auto-merge was automatically disabled August 30, 2025 18:04
Head branch was pushed to by a user without write access

Merge branch 'upstream_main' into multimodal_processing_plugin_entryp…

f306cca

…oint Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

vllm-bot merged commit 1cb39db into vllm-project:main Sep 1, 2025
69 of 71 checks passed

DarkLight1337 mentioned this pull request Sep 1, 2025

[RFC]: Hidden states processor #12249

Closed

1 task

christian-pinto mentioned this pull request Sep 1, 2025

[docs][misc] IOProcessor plugins fixes #24046

Merged

AlpinDale mentioned this pull request Sep 3, 2025

feat: IO processor plugin for pooling models aphrodite-engine/aphrodite-engine#1466

Merged

luccafong mentioned this pull request Sep 11, 2025

[RFC]: Support Returning Prompt Hidden States #24288

Open

1 task

maxdebayser mentioned this pull request Sep 26, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

Merged

5 tasks

alex-jw-brooks mentioned this pull request Oct 3, 2025

[Feature]: Make Custom I/O Processing Plugins More General #26157

Open

1 task

	from vllm.entrypoint.llm_for_image_tiling import LLMForImageTiling
	from vllm.entrypoints.llm_for_image_tiling import LLMForImageTiling

-input = {"type": "url", "data": image_url}
-output = llm.predict(input)
+model_input = {"type": "url", "data": image_url}
+output = llm.predict(model_input)

	def __init__(self, type: str, format: str, data: Any):
	def __init__(self, type: Literal["path", "object"], format: str, data: Any):

-        except Exception:
-            pass
+        except Exception:
+            logger.warning("Failed to load plugin %s.", name, exc_info=True)

Uh oh!

[Misc] IO Processor plugins for pooling models #22820

[Misc] IO Processor plugins for pooling models #22820

Uh oh!

Conversation

christian-pinto commented Aug 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser commented Aug 13, 2025

Uh oh!

christian-pinto commented Aug 13, 2025

Uh oh!

Uh oh!

DarkLight1337 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 19, 2025

Uh oh!

christian-pinto commented Aug 19, 2025

Uh oh!

christian-pinto commented Aug 20, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Aug 30, 2025

Uh oh!

Uh oh!

christian-pinto commented Sep 1, 2025

christian-pinto commented Aug 13, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Aug 14, 2025 •

edited

Loading

DarkLight1337 Aug 14, 2025 •

edited

Loading