Skip to content

Some performance improvement tricks on edge devices. #212

@zz990099

Description

@zz990099

Hi! Thanks for your great work! We are trying to deploy the pretrained model on some edge devices.

We followed the onnx model export script and got the lightstereo-s-sceneflow-general.onnx model. By using Netron to inspect the model, we identified a potential optimization.

Here is the original structure of correlation_volume processing in onnx.
Image

We modified the code at link.

def correlation_volume(left_feature, right_feature, max_disp):
    b, c, h, w = left_feature.shape

    padded_right = F.pad(right_feature, (max_disp, 0, 0, 0)) 

    cost_volume = torch.stack([
        (left_feature * padded_right[:, :, :, max_disp - i : max_disp + w - i]).mean(dim=1)  # 计算相似度
        for i in range(max_disp)
    ], dim=1)  

    return cost_volume.contiguous()

The LightStereo model inference performance based on RKNN or OnnxRuntime has been significantly improved.

orangepi-5-plus-16GB qps cpu
lightstereo(fp16) - origin 3.7 65%
lightstereo(fp16) - opt 9 35%
lightstereo(fp16) - origin - async 14 210%
lightstereo(fp16) - opt - async 29 90%
intel-i7-11800H qps cpu
lightstereo(fp16) - origin 7 800%
lightstereo(fp16) - opt 9 800%

However, on nvidia device, benefiting from the Myelin optimization engine, the original model's inference process is already well-optimized, making further optimizations redundant.

nvidia-3080-laptop qps cpu
lightstereo(fp16) - origin 388 150%
lightstereo(fp16) - opt 370 150%
lightstereo(fp16) - origin - async 418 170%
lightstereo(fp16) - opt - async 390 170%
jetson-orin-nx-16GB qps cpu
lightstereo(fp16) - origin 70 65%
lightstereo(fp16) - opt 65 70%
lightstereo(fp16) - origin - async 76 80%
lightstereo(fp16) - opt - async 69 85%

Would it be helpful if I submit a PR for this? I’d be happy to.
Please let me know if this aligns with the project's direction. I completely understand if this optimization isn’t a priority at the moment.

You could find our implementation and test code at link

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions