-
-
Notifications
You must be signed in to change notification settings - Fork 55.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNN: Try to be compatible with win32 #22454
Conversation
Hi @alalek, can you test if this patch can fix the issue Win32? |
@@ -21,7 +21,7 @@ enum { FAST_VEC_NLANES=4 }; | |||
#define CONV_MR 4 | |||
#define CONV_NR 24 | |||
|
|||
#ifdef CV_AVX2 | |||
#if CV_AVX2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISA-specific checks should be avoided in general (as API is called "universal intrinsics"), they could be used for fine-tuning only.
CV_SIMD_WIDTH should be used instead for detection of SIMD128 and others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder, what macros should we use to distinguish function calls? How about #if __AVX2__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to comment // SIMD 128
in the else branch you don't want to handle AVX2 here only (because code is a priori broken for other SIMD256 ISAs).
I believe you want to check for SIMD256 / 128.
So you should do that through CV_SIMD_WIDTH check in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point now. For now, we have AVX2 branch, NEON branch, and Universal intrinsics (SIMD 128).
And if we just set FAST_VEC_NLANES to 8
when SIMD256 is true. We need add the then Universal intrinsics (SIMD 256) implementation to prevent code errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reconsider how to better support more platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alalek, the current AVX implementation is compatible with AVX and AVX2. How can the implementation of a function (like convBlock_AVX2
) exist in two namespaces (opt_AVX
and opt_AVX2
) at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updata: The current workaround is that AVX2 will be computed at AVX2 branch, and AVX to the SIMD256 branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update!
I would propose to keep original one-line "quick fix" to unlock win32/linux32 builds and then move refactoring into separate PR (as there is still open questions).
How can the implementation of a function (like convBlock_AVX2) exist in two namespaces (opt_AVX and opt_AVX2) at the same time?
This should be implemented with "runtime dispatching" through .simd.hpp
files (finally we should not have SIMD code in .cpp files). See also this wiki page: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
703c147
to
0138615
Compare
0138615
to
b69b1ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍
Fixes #22450
Relates #22401
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.