Skip to content

[Serve] Flagscale serve supports automatically composition of multiple models #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

cyber-pioneer
Copy link
Collaborator

Description

Flagscale serve supports automatically composition of multiple models

python run.py --config-path ./examples/qwen2_5/conf --config-name serve_multiple_models action=run

@cyber-pioneer cyber-pioneer requested a review from a team as a code owner August 13, 2025 06:39
@Hchnr
Copy link
Collaborator

Hchnr commented Aug 13, 2025

I'm not sure, maybe it's a better choice to implement it at a higher level (adding a new compose-mode with no changes in serve-mode) ?

deploy:
port: 6701
use_fs_serve: true
enable_omposition: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this parameter means?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_composition means using composition of multiple models.


from dag_utils import check_and_get_port
from fastapi import FastAPI, HTTPException, Request
from pydantic import create_model
from ray import workflow
from ray import serve
from ray.serve.handle import DeploymentHandle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need add ray version restrictions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, more ray versions will be tested later


from flagscale.logger import logger

RequestData = create_model(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to deal with multimodal input

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set config as follows:

    deploy:
      request:
        args:
          - prompt
          - num
        types:
          - str
          - int

@cyber-pioneer
Copy link
Collaborator Author

I'm not sure, maybe it's a better choice to implement it at a higher level (adding a new compose-mode with no changes in serve-mode) ?

As discussed, what you mentioned refers to combining different independent services, while this scenario focuses on combining multiple different models. Your idea will be taken into consideration

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses: ./.github/workflows/functional-tests.yml should be changed to uses: ./.github/workflows/functional-tests-nvidia.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cyber-pioneer cyber-pioneer changed the title [WIP] Flagscale serve supports automatically composition of multiple models [Serve] Flagscale serve supports automatically composition of multiple models Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants