-
Notifications
You must be signed in to change notification settings - Fork 444
docs: Add multimodal architecture doc #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
WalkthroughA new documentation file, Changes
Suggested labels
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/multimodal/architecture.md (2)
2-3
: Introduce an Overview section. The document jumps directly from the title to Motivation; adding a brief “## Overview” after the title to summarize key points would help readers orient themselves.
12-12
: Hyphenate compound adjective in heading. Change “## High level architecture” to “## High-level architecture” for grammatical consistency.🧰 Tools
🪛 LanguageTool
[uncategorized] ~12-~12: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...serving of large multimodal models. ## High level architecture Dynamo has already demons...(EN_COMPOUND_ADJECTIVE_INTERNAL)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/images/multimodal_epd_disagg_workflow.png
is excluded by!**/*.png
📒 Files selected for processing (1)
docs/multimodal/architecture.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/multimodal/architecture.md
[uncategorized] ~12-~12: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...serving of large multimodal models. ## High level architecture Dynamo has already demons...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
docs/multimodal/architecture.md
17-17: Images should have alternate text (alt text)
null
(MD045, no-alt-text)
🔇 Additional comments (4)
docs/multimodal/architecture.md (4)
21-23
: Client and Processor components well explained. Responsibilities for sending requests and handling I/O are clear and concise.
25-28
: Decode Worker responsibilities are clear. The distinction between aggregation and disaggregated modes is well-defined.
29-32
: Prefill Worker flow is accurate. The interaction with the Encode Worker and subsequent prefill steps are outlined clearly.
33-35
: Encode Worker section is concise. The role in converting images to embeddings and forwarding them is well articulated.
Dynamo has already demonstrated [prefill-decode (PD) disaggregation](../disagg_serving.md) for efficient LLM serving. Building on that foundation, this proposal introduces a dedicated **encode worker** to handle the vision encoding step, fully separating the Encode, Prefill, and Decode phases. | ||
The diagram below illustrates the proposed workflow, based on the LLAVA model ( image + text → text) with vLLM. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add alt text for accessibility. The image tag 
lacks alt text. Replace with:

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
17-17: Images should have alternate text (alt text)
null
(MD045, no-alt-text)
🤖 Prompt for AI Agents
In docs/multimodal/architecture.md at line 17, the image markdown lacks alt text
which is important for accessibility. Add descriptive alt text by changing the
image tag from  to include alt text like .
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit