Skip to content

Add remote exec capability for foundation models missing it#1968

Open
hansent wants to merge 8 commits intomainfrom
remote-exec-for-all-models
Open

Add remote exec capability for foundation models missing it#1968
hansent wants to merge 8 commits intomainfrom
remote-exec-for-all-models

Conversation

@hansent
Copy link
Collaborator

@hansent hansent commented Feb 4, 2026

Manual testing results

I tested the workflows blocks with added remote exec functionality on a dev GPU with following results:

working:
gaze - works both with local and remote exec
depth estimation: works both with local and remote exec
SAM2: works both with local and remote exec
florence - works both with local and remote exec (some workflow block config causes broken results, but same for local/remote)
Moondream2 - works both with local and remote exec

not working/couldnt test:

  • smolvlm: doesnt work local or remote because model download fails (currently disabled on serverless anyway)
  • qwen: doesnt work local or remote because model download fails (currently disabled on serverless anyway)
  • SAM3 3d: couldnt install dependencies get model working (disabled on serverless anyway)

I think its ok to add the missing remote exec functionality and endpoints handling for these already also.

1. HTTP Client Methods Added (inference_sdk/http/client.py)

  • infer_lmm() - Generic LMM endpoint for Florence2, Moondream2, SmolVLM, Qwen models
  • depth_estimation() - Depth estimation endpoint
  • sam2_segment_image() - SAM2 segmentation endpoint
  • sam3_3d_infer() - SAM3 3D object generation endpoint
  • Added async variants for all methods

2. HTTP API Endpoint Added (inference/core/interfaces/http/http_api.py)

  • /sam3_3d/infer - New endpoint for SAM3 3D object generation with JSON-serializable response (base64-encoded binary data)

3. Workflow Blocks Updated with Remote Execution

All blocks now support StepExecutionMode.REMOTE:

Block Client Method Used
Gaze v1 detect_gazes()
Depth Estimation v1 depth_estimation()
Segment Anything 2 v1 sam2_segment_image()
Florence2 v1 infer_lmm()
Moondream2 v1 infer_lmm()
SmolVLM v1 infer_lmm()
Qwen2.5-VL v1 infer_lmm()
Qwen3-VL v1 infer_lmm()
Segment Anything 3 3D v1 sam3_3d_infer()

4. Unit Tests Added

  • test_gaze_remote.py - Gaze detection remote execution tests
  • test_depth_estimation.py - Depth estimation tests
  • test_segment_anything2.py - SAM2 tests
  • test_vlm_remote_execution.py - Tests for Florence2, Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL
  • test_segment_anything3_3d.py - SAM3 3D tests

5. Documentation Updated

Added "Execution Modes" sections to:

  • docs/foundation/florence2.md
  • docs/foundation/gaze.md
  • docs/foundation/depth_estimation.md
  • docs/foundation/sam2.md
  • docs/foundation/moondream2.md
  • docs/foundation/smolvlm.md
  • docs/foundation/sam3_3d.md

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 4, 2026

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for Qwen3VLBlockV1.run in inference/core/workflows/core_steps/models/foundation/qwen3vl/v1.py

⏱️ Runtime : 5.64 milliseconds 3.90 milliseconds (best of 71 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch remote-exec-for-all-models).

Static Badge

@hansent hansent marked this pull request as ready for review February 5, 2026 19:08
Copy link
Contributor

@bigbitbus bigbitbus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@grzegorz-roboflow grzegorz-roboflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants