Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN: Try to be compatible with win32 #22454

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented Aug 31, 2022

Fixes #22450
Relates #22401

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux32,Win32

@zihaomu
Copy link
Member Author

zihaomu commented Aug 31, 2022

Hi @alalek, can you test if this patch can fix the issue Win32?

@@ -21,7 +21,7 @@ enum { FAST_VEC_NLANES=4 };
#define CONV_MR 4
#define CONV_NR 24

#ifdef CV_AVX2
#if CV_AVX2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISA-specific checks should be avoided in general (as API is called "universal intrinsics"), they could be used for fine-tuning only.
CV_SIMD_WIDTH should be used instead for detection of SIMD128 and others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, what macros should we use to distinguish function calls? How about #if __AVX2__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to comment // SIMD 128 in the else branch you don't want to handle AVX2 here only (because code is a priori broken for other SIMD256 ISAs).
I believe you want to check for SIMD256 / 128.
So you should do that through CV_SIMD_WIDTH check in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point now. For now, we have AVX2 branch, NEON branch, and Universal intrinsics (SIMD 128).
And if we just set FAST_VEC_NLANES to 8 when SIMD256 is true. We need add the then Universal intrinsics (SIMD 256) implementation to prevent code errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will reconsider how to better support more platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alalek, the current AVX implementation is compatible with AVX and AVX2. How can the implementation of a function (like convBlock_AVX2) exist in two namespaces (opt_AVX and opt_AVX2) at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updata: The current workaround is that AVX2 will be computed at AVX2 branch, and AVX to the SIMD256 branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for update!
I would propose to keep original one-line "quick fix" to unlock win32/linux32 builds and then move refactoring into separate PR (as there is still open questions).


How can the implementation of a function (like convBlock_AVX2) exist in two namespaces (opt_AVX and opt_AVX2) at the same time?

This should be implemented with "runtime dispatching" through .simd.hpp files (finally we should not have SIMD code in .cpp files). See also this wiki page: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

@zihaomu zihaomu force-pushed the bug_fix_22450 branch 7 times, most recently from 703c147 to 0138615 Compare September 2, 2022 01:57
Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@opencv-pushbot opencv-pushbot merged commit 4159842 into opencv:4.x Sep 2, 2022
@alalek alalek mentioned this pull request Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNN: test failures on Win32 configuration (32-bit)
3 participants