[DEP] - Consolidating dynamo-serve and dynamo-run component API #10

ishandhanani · 2025-06-08T21:48:46Z

No description provided.

NNNN-limited-template.md

mohammedabdulwahhab · 2025-06-09T18:07:20Z

removing endpoint decorator from the service file
python3 -m components.llm.compa, python3 -m components.llm.compb, python3 -m components.llm.compa

… proposal

… issues

itay

One thing that is missing from the doc for me is the user flow, namely: I'm a user, I come to the project with some goal in mind (what is that goal), and then what are the steps I need to do to achieve that goal, both in the current world (which we claim is not good) and in the proposed new world.

itay · 2025-06-10T00:27:53Z

NNNN-limited-template.md

-## Goals
-
-**\[Optional \- if not applicable omit\]**
+The frontend actually uses `dynamo-run` internally to run the OpenAI-compatible HTTP server and Rust-based processor, while `dynamo-serve` manages the overall service orchestration.


What does it mean that the frontend uses dynamo-run?

The frontend component of a graph runs dynamo-run as a subprocess in order to start the rust server/processor. Here's an example https://github.com/ai-dynamo/dynamo/blob/main/examples/sglang/components/frontend.py

NNNN-limited-template.md

itay · 2025-06-10T00:30:49Z

NNNN-limited-template.md


+1. Each Dynamo component **MUST** be runnable using `python3 component.py <flags>`


Can we explain why this is critical? The explanation of "this is more Pythonic/hackable" isn't enough - what is a user trying to do for which they need this, what is the workflow that the lack of this is blocking?

An example is running a single model split across 2 nodes. Lets use SGLang Disagg as an example

In order to run TP>8 - you have to run the sglang worker with arg node_rank=0 on the head node and then node_rank=1 on the child node. In the current paradigm - that means you have to do

# node 1 - spin up the frontend, processor, and shard 1 of prefill worker but not any of the decode worker dynamo serve graphs.agg:Frontend -f configs/agg_head.yaml # node 2 - spin up shard 2 of the prefill worker dynamo serve graphs.agg:Frontend -f configs/agg_child.yaml --service-name SGLangWorker # node 3 - spin up only shard 1 of the decode worker dynamo serve graphs.disagg:Frontend -f configs/disagg_head.yaml --service-name SGLangDecodeWorker # node 4 - spin up only shard 2 of decode worker dynamo serve graphs.disagg:Frontend -f configs/disagg_child.yaml --service-name SGLangDecodeWorker

Note that I am having to run agg on the first 2 nodes, disagg on the next 2, and specify the --service-name on nodes 2 through 4. It would be great if I could just do

#node 1 dynamo-run in=http out=dyn & python3 prefill.py -f configs/disagg_head.yaml # node 2 python3 prefill.py -f configs/disagg_child.yaml # node 3 python3 decode.py -f configs/disagg_child.yaml # node 3 python3 decode.py -f configs/disagg_child.yaml

Going to use vllm_v1 to demonstrate the following

Another example is running each component on a different node. Say I have 4 H100 nodes and I am trying to run a

Frontend

Load Balancer

2 PrefillWorker

2 DecodeWorkers

I'll try a different approach to the one above and instead try to just run each component separately without picking and choosing from some link graph. Here's what I have to do to spin things up on node 1

# node 1 - spin up the CPU processes (frontend and load balancer) and prefill worker 1 # I have to keep specifying service-name or else the `depends` will force everything to spin up # unless I sed the `depends` out....bad UX dynamo serve components.frontend:Frontend -f config.yaml --service-name Frontend dynamo serve components.simple_load_balancer:SimpleLoadBalancer -f config.yaml --service-name SimpleLoadBalancer dynamo serve components.worker:VllmPrefillWorker -f config.yaml

I want to do the following on node 1

dynamo run in=http out=dyn python3 simple_load_balancer.py -f config.yaml python3 prefill_worker.py -f config.yaml

Thanks for the details. On the first example, I am not sure I see a huge delta between the two cases. On the second, I can see some of the issues. In general, kind of odd to see config.yaml used in both cases, I'd expect the pure Python version to be just command line args.

Perhaps more importantly, the use case for this seems to be more for the "Dynamo developer", rather than a normal user? Not to say that isn't important, but just want to make sure we're talking about the same persona.

On the first example, I am not sure I see a huge delta between the two cases

I guess something like dynamo serve graphs.agg:Frontend -f configs/agg_child.yaml --service-name SGLangWorker is a bit unintuitive for me. Why am I accessing the entire graph and selecting the PrefillWorker service from it? Is it not easier just to run the prefill worker python code?

Additionally - in order to run this full example across nodes - I have to run the agg graph and then run the disagg graph in order to avoid the depends statement in the prefill worker. If I don't then the decode worker spins up and causes an error because the it has no GPUs available.

In general, kind of odd to see config.yaml used in both cases, I'd expect the pure Python version to be just command line args.

Yea CLI args also work. A YAML/JSON parser would be helpful because these FWs require many many args to get things spun up sometimes. But this might just be the Dynamo Developer persona talking :)

Perhaps more importantly, the use case for this seems to be more for the "Dynamo developer", rather than a normal user? Not to say that isn't important, but just want to make sure we're talking about the same persona.

This is an interesting thought and it does bring up the question of who our ICP is for Dynamo? In my eyes - people that will be deploying our software at data center scale will want maximum flexibility to tinker with our FW (something I don't believe is present if I cannot run python code with python3). Ping to @harryskim for viz on this discussion

I would think the form should be something like:

dynamo serve graph {optional config} --node-list node_list.yaml

If you are serving a graph and you need multiple nodes for that graph that should be encapsulated in serve ....

having to run the command on multiple nodes with different slices - does seem to defeat the purpose of dynamo serve . dynamo serve, serves a graph with the required resources. So that should by definition be able to handle multi-node graphs.

It doesn't today. And maybe - with things like microk8s - we don't need this intermediate state of serving a graph without k8s or slurm.

VLLM and SGLang both let you run disagg serving + kv routing using just raw python in single/multinode settings.

To be clear - if I can run python3 sglang.launch_server .... to run an SGLang prefill worker, why can't I do python3 prefill_worker.py ... in Dynamo? That is a degradation in UX to me

itay · 2025-06-10T00:32:39Z

NNNN-limited-template.md


-\<bulleted list or pros describing the positive aspects of this solution\>
+1. We **SHOULD NOT** deprecate `dynamo serve` - the orchestration UX remains valuable


I will contend that this is incorrect, or rather, that an explicit goal is that there is only one way to run a Dynamo "graph", and we cannot have both dynamo-run and dynamo serve.

Agreed with you here. We've been using dynamo-run to describe the launcher for the rust frontend and the processor and less as a way to launch the other components. That's why its also used in the frontend. It's not really a launcher for the graph. It's a launcher for the rust bits

Understood - just not a good situation :)

There should only be one way to:

define and implement a component in python and rust

define a graph in python and rust (maybe yaml?)

launch a graph locally

launch a graph in k8s and slurm

Agreed there should be one way. I'm not sure I understand why dynamo-run is needed if the components can run via their main function and dynamo serve launches a graph.

itay · 2025-06-10T00:35:36Z

NNNN-limited-template.md

+        async for result in g:
+            yield result
+
+if __name__ == "__main__":


Is this code going to be 90% the same in every component? Not saying that's a problem, but want to make sure I understand.

Yea 90% is about right

Something for us to think about then.

The current approach is having these bits in serve_dynamo.py which is also why we cannot run each component without dynamo-serve. I personally am not a fan of abstractions especially since we have not found a stable API yet which is why I'm advocating for this being explicitly written out. I think it very much helps readability as well.

If we can find a middle ground I'm 100% open to looking at things differently.

To be clear, that wasn’t a “this is bad and we should think how to fix this”, but rather for us to decide how we feel about it.

An answer might be “we are fine with this for now and will revisit when we see stability or it starts really hurting”

I would separate the "how" from the "what" - meaning we have a set of requirements - the exact look and feel of the decorators can change. They should all be part of the runtime and not part of deploy.

That is - I'm not sure the boiler place of defining a component, a service, attaching a method to an endpoint is all really necessary if it can be hidden behind a decorator:

@dynamo_component(namespace="") class Foo: @dynamo_endpoint() def process_image(x:type):

I think the abstractions need to be ironed out

Agreed, its a lot of repeated boiler plate and a simple method to wrap these common pattern will follow DRY pattern.

itay · 2025-06-10T00:36:27Z

NNNN-limited-template.md

+    asyncio.run(worker(engine_args))
+```
+
+### FlexibleArgumentParser for Dynamo


Can you show an example of this?

NNNN-limited-template.md

…matting

itay · 2025-06-10T04:46:09Z

NNNN-limited-template.md

+│ └── utils.py
+```
+
+Each engine's `utils.py` would contain things like argument parsing/validation and any other helper functions. The `common/base_classes.py` could contain abstract classes for the `BaseDecodeWorker` or the `BaseWorker` class.


Thanks for adding this. I think we need to find a way where the "common" path (i.e. running one of these engines) is not dependent on "examples", but rather properly in the product surface area. But likely a separate discussion.

once we establish a direction here in terms of how to define a component or any changes we want to make there - the goal is to move production components to top level:

components/engines components/engines/vllm components/common/ components/

this can then be a sub package:

dynamo.components

along side

dynamo.runtime dynamo.llm

nnshah1 · 2025-06-10T14:14:53Z

NNNN-limited-template.md

+  - Remove `async_on_start`
+  - Remove `dynamo_endpoint`
+  - Remove runtime injection via `dynamo_context`
+- We can start using abstract classes for our request handlers to address the issue to repeated code for things like `Router` and `Frontend`


I think the issue for abstraction is not the definition of the request handlers - but the proxy / use objects.

That is, how to abstract the component, and namespace from the caller in a typed way.

Abstraction for the request handlers - is great for reuse at the implementation of the request handler - but I think doesn't complete the 'proxy objects' on the client side.

That is, how to abstract the component, and namespace from the caller in a typed way.

I think this is addressed in the Deployment class no? I also don't think I fully understand what you mean by proxy/use objects

something like:

https://github.com/ai-dynamo/dynamo/blob/8115584715b8571b0e2c1216a6e6c9135a13b364/examples/llm/components/processor.py#L58

The specific implementation of "LLMWorker" type is hidden from the client but still typed ....

to me that was the key abstract of the Abstraction

Does Deployment class do that?

The specific implementation of "LLMWorker" type is hidden from the client but still typed ....

This depends is not even used for that purpose. It's a remnant of BentoML. It's only there so that the LLMWorker will spin up using dynamo serve.

As others have said, The point of having the LLMWorker interface was to define types. ie so I can swap-out a different LLM backend if I want with the same interface.

In addition I think we can remove a lot of boilerplate in setting up the endpoints, if we have some base component or decorators.

What would BaseDeployment do?

nnshah1

I feel as if this combines a few different issues into one:

how dynamo serve and dynamo run relate to each other
how to launch a graph locally in multi-node settings
how to define a component in python and rust
how to define a graph in python and rust - and what the semantics of a graph are

The proposal sets out to "consolidate dynamo serve and dynamo run" - but really lays out how to define a component in a way that it can be launched independently of either.

I'm still in favor of separating these out into separate proposals:

component object model:

https://github.com/ai-dynamo/enhancements/blob/runtime_definitions/deps/NNNN-distributed-runtime-component-model.md

graphs:

https://github.com/ai-dynamo/enhancements/blob/graphs/deps/NNNN-dynamo-graphs.md

dynamo run / dynamo serve / dynamo deploy

https://github.com/ai-dynamo/enhancements/blob/deploy/deps/NNNN-run-and-deploy.md

I think the basic reason we have the particular issues you point out are precisely that we've coupled how to launch a graph (dynamo serve) with how you define a component (the decorators that dynamo serve uses). These should be defined separately - how you define a component and how you define a graph should be loosely coupled to how you launch them.

We also need to separate dynamo-run as an application from dynamo run as a workflow.

dynamo-run as an application is the http frontend, pre-processor, and router as a component.

biswapanda · 2025-06-10T23:00:35Z

Agreed, python based entrypoint python3 -m ... makes sense.
class method decorator (and abstract class) approach allows us to enforce a typed interface/contract.
with mypy and IDE It's very easy to catch component/endpoint name/type mismatch issues across dynamo component and it's client.

ishandhanani · 2025-06-10T23:34:41Z

Agreed, python based entrypoint python3 -m ... makes sense.

class method decorator (and abstract class) approach allows us to enforce a typed interface/contract.
with mypy and IDE It's very easy to catch component/endpoint name/type mismatch issues across dynamo component and it's client.

If BentoML was not a part of our codebase - I have a strong feeling we would not have used the class decorator. According to my chat with Mohammed - that decorator is used to make the Bento style K8s deploymen functional (also something that might not have been done had we not used Bento in the first place). Everything else that the class decorator provides can be accomplished with the Deployment class.

…scussion notes

ishandhanani · 2025-06-11T23:12:01Z

Today I worked on adding an embedding component to SGLang. One thing I realized is that we cannot run a new graph without creating a new file that states Frontend.link(NewWorker)

If I just want to have dynamo-run started and quickly iterate on the embedding code - it's not possible without continuously starting and restarting dynamo serve.

messiaen · 2025-06-12T03:28:14Z

NNNN-limited-template.md

+    @dynamo_worker()
+    async def worker(runtime: DistributedRuntime, engine_args):
+        worker = SGLangDecodeWorker(engine_args)
+        deploy = Deployment()
+
+        component = runtime.namespace(deploy.namespace).component(deploy.name)
+        await component.create_service()
+
+        # serve endpoint
+        endpoint = component.endpoint("generate")
+        await endpoint.serve_endpoint(worker.generate)


If we have the correct abstractions this boilerplate can completely go away.

This gets worse if I want to have multiple endpoints in a component / service.

ishandhanani · 2025-06-13T08:54:16Z

Summary of discussion 6/12

Members present: @ishandhanani @nnshah1 @mohammedabdulwahhab @alec-flowers @rmccorm4 @messiaen @biswapanda

Summary:
Everyone on the call agreed that

We need to ensure that components are runnable via python3. This involves removing serve_dynamo.py
We need to remove untyped dict dynamo_context
dynamo-serve should run the python3 command for each component when it serves

@mohammedabdulwahhab shared his proposal. By the end of the our conversation - we came to the following agreement

Components will be runnable via python3 with the addition of a new main method for all components. A user can use a very lightweight serve/start function (name/impl TBD) which will remove the DRT boilerplate code (the code that is 90% the same).
This command would also instantiate a typed DynamoContext which may be replaced by a more robust instance identifier that is being worked on in rust-land. But for now - it is a much better alternative compared to the untyped dict
dynamo-serve will run the same python3 command that a user would run to spin up each component
Once @mohammedabdulwahhab has a first draft - Neelay mentioned that we can try to use this to replace some of the vllm_inc or trtllm_inc files. If we can replace those - then we are in a much better spot.

Next steps:

This is a P0 initiative. @mohammedabdulwahhab and @biswapanda will share an ETA on when this will be ready for us to test and when dynamo serve will be able to use these components

Open Questions

Class based @service decorator is currently staying. This should be revisted. Is there a better way to pass information to deploy
There is thought of adding an entrypoint decorator. @mohammedabdulwahhab said he would try to write the first draft without it. But the question remains of which decorators we should have and where they belong
There are many unanswered questions about our graph approach. A separate DEP/discussion is needed

init

ef419ce

alec-flowers reviewed Jun 8, 2025

View reviewed changes

NNNN-limited-template.md Outdated Show resolved Hide resolved

alec-flowers reviewed Jun 8, 2025

View reviewed changes

NNNN-limited-template.md Outdated Show resolved Hide resolved

alec-flowers reviewed Jun 8, 2025

View reviewed changes

NNNN-limited-template.md Outdated Show resolved Hide resolved

alec-flowers reviewed Jun 8, 2025

View reviewed changes

NNNN-limited-template.md Outdated Show resolved Hide resolved

ishandhanani added 3 commits June 9, 2025 12:25

update based on convo

69fdf5f

bump

23f01f4

docs(template): consolidate dynamo-serve and dynamo-run component API…

3425020

… proposal

ishandhanani changed the title ~~[DEP] - An alternate approach to Dynamo Serve and a Dynamo Component~~ [DEP] - Consolidating dynamo-serve and dynamo-run component API Jun 9, 2025

ishandhanani added 2 commits June 9, 2025 15:06

docs(architecture): clarify dynamo-serve vs dynamo-run approaches and…

a1f02da

… issues

docs(templates): add proposed structure for unified examples directory

f094099

itay reviewed Jun 10, 2025

View reviewed changes

ishandhanani added 2 commits June 9, 2025 17:44

split 2 issues out

2f3666e

docs(template): add sglang import and improve directory structure for…

14f3256

…matting

itay reviewed Jun 10, 2025

View reviewed changes

ishandhanani added 2 commits June 9, 2025 23:20

remove irrelevant discussion on abstraction

f9d317b

fix

0530bc4

nnshah1 reviewed Jun 10, 2025

View reviewed changes

ishandhanani added 2 commits June 10, 2025 17:22

docs(template): update component authoring documentation with team di…

735090e

…scussion notes

docs(template): update example section title for clarity

ab3b36b

messiaen reviewed Jun 12, 2025

View reviewed changes

tanmayv25 mentioned this pull request Jun 14, 2025

feat: Expose the task flag from dynamo-run cli for trtllm ai-dynamo/dynamo#1529

Closed


		1. Each Dynamo component MUST be runnable using `python3 component.py <flags>`


		\<bulleted list or pros describing the positive aspects of this solution\>
		1. We SHOULD NOT deprecate `dynamo serve` - the orchestration UX remains valuable

[DEP] - Consolidating dynamo-serve and dynamo-run component API #10

Are you sure you want to change the base?

[DEP] - Consolidating dynamo-serve and dynamo-run component API #10

Uh oh!

Conversation

ishandhanani commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mohammedabdulwahhab commented Jun 9, 2025

Uh oh!

itay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ishandhanani Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading

ishandhanani Jun 10, 2025 •

edited

Loading