Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video processing in inference server #679

Merged

Conversation

PawelPeczek-Roboflow
Copy link
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow commented Sep 25, 2024

Description

The goal of this feature is to bring video processing capabilities into inference server - long story short, Workflows should run against videos without additional scripts needed.

State of the work:

🟢 Old enterprise stream management components copied and adjusted to process workflows
🟢 Basic endpoints to manage stream states enabled (initialise, list, get state, consume, pause, resume, terminate)
🟢 Basic tests coverage
🔴 Full support for old enterprise features (old stream management was running InferencePipeline without Workflows)
🔴 true integration tests
🔴 Functionality to start video processing on start of the conrainer

Issues spotted:

Performance

The same workflow tested, reporting only latency for single frame processing inside WorkflowRunner.run_workflow(...) function:

  • MacBook, bare metal in script using InferencePipeline directly - ~40ms
  • MacBook, inside docker container, behind API - ~110ms
  • Jetson Nano Orin, bare metal in script using InferencePipeline directly - not measured precisely now, but older tests indicated the same performance as on MacBook when yolov8n-640 used - which was the model used in test case
  • Jetson Nano Orin, inside docker container, behind API - ~50ms

We have docker overhead, not 100% sure if it is visible on Jetson devices, but MacBook one makes drop from 27fps into <10fps 😢

Passing localhost camera to docker

On MacBook it is very hard (requires tons of configuration - https://medium.com/@jijupax/connect-the-webcam-to-docker-on-mac-or-windows-51d894c44468) to pass device camera to container which would be required to have nice demos without UI streaming to the inside of container. @grzegorz-roboflow suggested passing frames through Unix socket which seems feasible - please clarify if I should allocate time to implement that.

Open questions

  • do we want to port all old functionalities
  • are we fine with this feature to be completed without re-streaming (managed to verify that pooling results from buffer (consume result HTTP endpoint) is ok - not great / not terribly bad
  • how we should deal with auth on endpoints not touching API key gated resources (for instance manipulation of pipelines state) - I would assume we should auth API key in out backend to avoid malicious attacks, right? But then - how about offline use-cases?
  • We have the following setup now:
    • We init processing starting pipeline in separate process
    • Each init ends with success when process starts
    • We do not wait for pipeline to connect to source (we would block other requests)
    • We end up in a state when client needs to check for pipeline status - initial video source connection error terminates inference pipeline process - pipeline will re-connect only if connection to camera is broken after initial connection is established
    • That may be problematic to wait for initial connection to complete in general sense, if we wanted to retry, we would block the stream manager socket for longer
    • not really sure how to address the problem, but I see this problematic when we enable the "on-start-processing" feature and people may sometimes be surprised on their pipelines failing on tmp connectivity issues with cameras

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

  • Docs updated? What were the changes:

@hansent
Copy link
Contributor

hansent commented Sep 25, 2024

do we want to port all old functionalities
what thing are part of this?

I think we can treat it as separate feature for workflows / don't need to port all stream management features until we need them. E.g. not sure we ned pause/resume explicitly (benefit is really just to not have to create / configure the stream again, which is ideally done as part of worklfow spec now anyway?)

are we fine with this feature to be completed without re-streaming (managed to verify that pooling results from buffer (consume result HTTP endpoint) is ok - not great / not terribly bad
I think yes lets go without re-streaming for now. As long as we can get video back out to look at to display for now we can focus on building the API and workflow processing. Seems like the kind of thing that can be added somewhat cleanly later if we need it, but increases overall scope /complexity if we do it all at once?

Reasons for doing it now would be if it requires different architecture for how we process / start / manage streams, but I don't think thats the case.

another thought: could re-streaming be separate sink / stateful block that creates a stream?

how we should deal with auth on endpoints not touching API key gated resources (for instance manipulation of pipelines state) - I would assume we should auth API key in out backend to avoid malicious attacks, right? But then - how about offline use-cases?

Can we do it the same way we do for inference / workflow endpoints? For dedicated deployments I think we have a check that API key matches the owner of the deployment. For local / user managed we can allow the requests but auth on model access / other api calls that need API keys

@hansent
Copy link
Contributor

hansent commented Sep 25, 2024

Passing localhost camera to docker
On MacBook it is very hard (requires tons of configuration - https://medium.com/@jijupax/connect-the-webcam-to-docker-on-mac-or-windows-51d894c44468) to pass device camera to container which would be required to have nice demos without UI streaming to the inside of container. @grzegorz-roboflow suggested passing frames through Unix socket which seems feasible - please clarify if I should allocate time to implement that.

I think we need:

  • a way to use usb / local device as video source for device deployments (jetson, roboflow box). we have that for bare metal I think. Should work in docker for jetson / roboflow boxes / field deployments, less important to work ion docker on mac (especially if we have a way to use network input also)
  • get video input via network. I think RTSP and webrtc would be ideal and would cover a lot of use cases / allow building adapters in front of it when needed.

@PawelPeczek-Roboflow PawelPeczek-Roboflow marked this pull request as ready for review September 27, 2024 07:46
@PawelPeczek-Roboflow PawelPeczek-Roboflow merged commit 6acb030 into main Sep 27, 2024
57 checks passed
@PawelPeczek-Roboflow PawelPeczek-Roboflow deleted the feature/video_processing_in_inference_server branch September 27, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants