Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve yolov8 post-processing efficiency. #5658

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

whyb
Copy link
Contributor

@whyb whyb commented Aug 29, 2024

Using OpenMP to improve yolov8 post-processing efficiency.

#if NCNN_SIMPLEOMP
#include "simpleomp.h"
#else
#include <omp.h>
#endif

...

static void parse_yolov8_detections(
    float* inputs, float confidence_threshold,
    int num_channels, int num_anchors, int num_labels,
    int infer_img_width, int infer_img_height,
    std::vector<Object>& objects)
{
    std::vector<Object> detections;
    cv::Mat output = cv::Mat((int)num_channels, (int)num_anchors, CV_32F, inputs).t();

    const size_t stride = num_anchors;
    const size_t num_threads = omp_get_max_threads();
    const size_t chunk_size = stride / num_threads;
    #pragma omp parallel shared(detections)
    {
        const size_t thread_id = omp_get_thread_num();
        const size_t start_idx = thread_id * chunk_size;
        const size_t end_idx = (thread_id == num_threads - 1) ? stride : (start_idx + chunk_size);
        for (int i = start_idx; i < end_idx; i++)
        {
            const float* row_ptr = output.row(i).ptr<float>();
            const float* bboxes_ptr = row_ptr;
            const float* scores_ptr = row_ptr + 4;
            const float* max_s_ptr = std::max_element(scores_ptr, scores_ptr + num_labels);
            float score = *max_s_ptr;
            if (score > confidence_threshold)
            {
                float x = *bboxes_ptr++;
                float y = *bboxes_ptr++;
                float w = *bboxes_ptr++;
                float h = *bboxes_ptr;

                float x0 = clampf((x - 0.5f * w), 0.f, (float)infer_img_width);
                float y0 = clampf((y - 0.5f * h), 0.f, (float)infer_img_height);
                float x1 = clampf((x + 0.5f * w), 0.f, (float)infer_img_width);
                float y1 = clampf((y + 0.5f * h), 0.f, (float)infer_img_height);

                cv::Rect_<float> bbox;
                bbox.x = x0;
                bbox.y = y0;
                bbox.width = x1 - x0;
                bbox.height = y1 - y0;
                Object object;
                object.label = max_s_ptr - scores_ptr;
                object.prob = score;
                object.rect = bbox;
                #pragma omp critical
                {
                    detections.push_back(object);
                }
            }
        }
    }
    objects = detections;
}

whyb and others added 30 commits May 16, 2023 10:16
@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ nihui
❌ whyb
You have signed the CLA already but the status is still pending? Let us recheck it.

object.label = max_s_ptr - scores_ptr;
object.prob = score;
object.rect = bbox;
#pragma omp critical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but simpleomp does not support critical clause

const size_t stride = num_anchors;
const size_t num_threads = omp_get_max_threads();
const size_t chunk_size = stride / num_threads;
#pragma omp parallel shared(detections)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but simpleomp does not support shared clause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but simpleomp does not support shared clause

What alternatives are there to achieve similar functionality?😭

@whyb
Copy link
Contributor Author

whyb commented Sep 3, 2024

Hello @nihui,

I found the some ncnn's example usage of OpenMP parallel sections ( #pragma omp parallel sections ) in the following files:

examples/yolov7.cpp
examples/scrfd_crowdhuman.cpp
examples/fasterrcnn.cpp
examples/yolov5.cpp
examples/yolox.cpp
examples/scrfd.cpp
examples/rfcn.cpp
examples/nanodet.cpp
examples/retinaface.cpp

But I checked the implementation of src/simpleomp.cpp and I did not find the necessary functions for sections:

GOMP_parallel_sections_start()
GOMP_parallel_sections()
GOMP_sections_start()
GOMP_sections_next()
GOMP_sections_end()
GOMP_sections_end_cancel()
GOMP_sections_end_nowait()

Does this mean that simpleomp will not support OpenMP sections feature for a long time?

If we follow the same standard, since the example previously allowed the use of features that simpleomp does not support, the OpenMP shared clause feature should be allowed to be added.🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants