Conversation
|
Thank you for implementing this! This looks good, I'll make sure there's no compatibility issue before merging it though. This will probably come in the next release as binary to avoid any breaking change. |
|
I tested it and it seems to handle correctly every configs 🎉 |
|
This is great news, thank you ! |
|
This is a great piece of work, and will really help me speed up a particular pipeline. However, could I suggest the following very minor modification to handle grayscale images (cupy doesn't like shape and stride parameters of different lengths): |
… did not like this>
|
@kwadl good catch ! I was able to reproduce this issue by pulling the confidence map: confidence_map = sl.Mat(width, height, sl.MAT_TYPE.U8_C1, sl.MEM.GPU)
zed.retrieve_measure(confidence_map, sl.MEASURE.CONFIDENCE, sl.MEM.GPU)And the proposed fix works like a charm. Thanks! |
* Include GPU support from #241
Previously, I tried to extend the Python API with the ability to keep the data on the GPU (#230), and I ran into some weird behaviors (back then they were weird, but now, it's obvious that it was just a lack on understanding of how the data is laid out in memory).
This PR, however, provides a fully functional extension.
NOTE: this change adds an extra dependency; cupy.
The targeted function is
get_data(), and both modes of providing data (memory view / deep copy) were implemented for GPU as well.This was tested on an
Nvidia AGX Orin 32Gb, withJetPack 5.1.2, andZED_SDK_4.1.4.Shoutout to @andreacelani for the discussion that lead to figuring out how to implement this correctly (look into the closed PR #230 for details).
Benchmarking with an ML pipeline:
@andreacelani did some benchmarking with impressive results: #230 (comment)
Additionally, I tested it myself using a real feed from a ZED Mini with a simple pipeline (see picture), and here are my findings:
TL;DR:
Details:
Notes:
from ultralytics import YOLO), and a custom trainedPytorchYOLOV8 model.HD2Kgrabbing, my pipeline wasn't saturating the15FPSrate, thus grabbing was seemingly slower in GPU (faulty read).4 channelto3 channelreduction, resizing (to meet the 640x640 expected input), and normalization.PCLjust to simulate real work. (code details are here Feature/get data gpu #230 (comment).)