Skip to content

Introduce min runner version to selection #3505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 9, 2025
Merged

Conversation

leszko
Copy link
Contributor

@leszko leszko commented Apr 14, 2025

Add -aiMinRunnerVersion which allows to specify the minimum runner version. E.g.

[
   {
      "model_id":"comfyui",
      "pipeline":"live-video-to-video",
      "minVersion":"0.0.2"
   }
]

This configuration will accept only Orchestrators that report Runner version 0.0.2 or higher.

This works only for warm containers.

This PR is needed when we start introducing public Orchestrators.

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Apr 14, 2025
Copy link

codecov bot commented Apr 14, 2025

Codecov Report

Attention: Patch coverage is 48.00000% with 104 lines in your changes missing coverage. Please review.

Project coverage is 30.60820%. Comparing base (130b742) to head (3965de3).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
ai/worker/runner.gen.go 32.50000% 51 Missing and 3 partials ⚠️
ai/worker/worker.go 0.00000% 15 Missing ⚠️
core/capabilities.go 77.19298% 9 Missing and 4 partials ⚠️
cmd/livepeer/starter/starter.go 0.00000% 7 Missing ⚠️
net/lp_rpc.pb.go 0.00000% 5 Missing ⚠️
server/ai_session.go 60.00000% 4 Missing ⚠️
ai/worker/container.go 50.00000% 2 Missing and 1 partial ⚠️
ai/worker/utils.go 83.33333% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3505         +/-   ##
===================================================
+ Coverage   30.55685%   30.60820%   +0.05135%     
===================================================
  Files            151         152          +1     
  Lines          45237       45429        +192     
===================================================
+ Hits           13823       13905         +82     
- Misses         30605       30706        +101     
- Partials         809         818          +9     
Files with missing lines Coverage Δ
cmd/livepeer/livepeer.go 56.59574% <100.00000%> (+0.18548%) ⬆️
core/ai.go 58.51852% <ø> (ø)
core/ai_orchestrator.go 31.10329% <ø> (ø)
discovery/discovery.go 85.13514% <100.00000%> (ø)
net/lp_rpc_grpc.pb.go 9.93789% <ø> (ø)
ai/worker/container.go 44.44444% <50.00000%> (+0.58479%) ⬆️
ai/worker/utils.go 83.33333% <83.33333%> (ø)
server/ai_session.go 6.66667% <60.00000%> (+0.73675%) ⬆️
net/lp_rpc.pb.go 15.50388% <0.00000%> (-0.04636%) ⬇️
cmd/livepeer/starter/starter.go 7.30816% <0.00000%> (-0.06285%) ⬇️
... and 3 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d62df7...3965de3. Read the comment docs.

Files with missing lines Coverage Δ
cmd/livepeer/livepeer.go 56.59574% <100.00000%> (+0.18548%) ⬆️
core/ai.go 58.51852% <ø> (ø)
core/ai_orchestrator.go 31.10329% <ø> (ø)
discovery/discovery.go 85.13514% <100.00000%> (ø)
net/lp_rpc_grpc.pb.go 9.93789% <ø> (ø)
ai/worker/container.go 44.44444% <50.00000%> (+0.58479%) ⬆️
ai/worker/utils.go 83.33333% <83.33333%> (ø)
server/ai_session.go 6.66667% <60.00000%> (+0.73675%) ⬆️
net/lp_rpc.pb.go 15.50388% <0.00000%> (-0.04636%) ⬇️
cmd/livepeer/starter/starter.go 7.30816% <0.00000%> (-0.06285%) ⬇️
... and 3 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leszko leszko marked this pull request as ready for review April 15, 2025 11:52
Copy link
Member

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed mostly the ai/worker part which I have more context on and LGTM. I'll defer the rest to someone else that could be available to review or I can focus again on this after the multistream GPU work.

@leszko leszko requested review from emranemran and victorges April 18, 2025 07:57
@pwilczynskiclearcode pwilczynskiclearcode self-requested a review April 22, 2025 08:58
Copy link
Member

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ad-astra-video
Copy link
Collaborator

ad-astra-video commented Apr 25, 2025

Added some in line comments. Would like to look at once more tomorrow

Couple questions:

  • this is intended to function the same as the go-Livepeer min version constraint where if the lowest runner version on the Orchestrator is lower than the runner min version set by the Gateway for the pipeline/model_id the Orchestrator will be excluded in discovery process?
  • There is no confirmation of runner version at time of request? I see this as pretty low risk but could happen if runner is terminated after launch of Orchestator (or ai-worker for batch) and re-launched incorrectly. Seems relatively far down the adversarial rabbit hole to me and the Orchestrator risks being blocked by the Gateway if they play these games.

@leszko
Copy link
Contributor Author

leszko commented May 5, 2025

Thanks for review @ad-astra-video

  • this is intended to function the same as the go-Livepeer min version constraint where if the lowest runner version on the Orchestrator is lower than the runner min version set by the Gateway for the pipeline/model_id the Orchestrator will be excluded in discovery process?

Yes

  • There is no confirmation of runner version at time of request? I see this as pretty low risk but could happen if runner is terminated after launch of Orchestator (or ai-worker for batch) and re-launched incorrectly. Seems relatively far down the adversarial rabbit hole to me and the Orchestrator risks being blocked by the Gateway if they play these games.

Yes. Technically it's possible to happen, but super low chances. So, let's maybe leave it as it is. If we start seeing it, then we may introduce some periodic checks.

@leszko leszko requested a review from ad-astra-video May 5, 2025 10:51
@ad-astra-video
Copy link
Collaborator

@leszko I reviewed with some minor updates suggestions in line

  • Confirmed selection excludes Orchestrator if runner version is under min version set. First request builds the AISessionPool and the Orchestrator is left out. Restarting with min version less than the runner correctly selected the Orchestrator and started the test stream (used noop container).
  • confirmed runnerVersion reports in /getNetworkCapabilities endpoint
                 "constraints": {
                    "PerCapability": {
                        "35": {
                            "models": {
                                "noop": {
                                    "warm": true,
                                    "capacity": 1,
                                    "runnerVersion": "0.1.0-6e50320b"
                                }
                            }
                        }
                    }
                }

@leszko
Copy link
Contributor Author

leszko commented May 9, 2025

@leszko I reviewed with some minor updates suggestions in line

  • Confirmed selection excludes Orchestrator if runner version is under min version set. First request builds the AISessionPool and the Orchestrator is left out. Restarting with min version less than the runner correctly selected the Orchestrator and started the test stream (used noop container).
  • confirmed runnerVersion reports in /getNetworkCapabilities endpoint
                 "constraints": {
                    "PerCapability": {
                        "35": {
                            "models": {
                                "noop": {
                                    "warm": true,
                                    "capacity": 1,
                                    "runnerVersion": "0.1.0-6e50320b"
                                }
                            }
                        }
                    }
                }

Thanks Brad. I'll re-test it again myself and merge the PR!

@leszko leszko merged commit 78dff88 into master May 9, 2025
16 of 18 checks passed
@leszko leszko deleted the rafal/runner-version branch May 9, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch. go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants