-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new ring source #181
new ring source #181
Conversation
# Conflicts: # src/mcx_core.cu # src/mcx_utils.c
Hi @fangq , thanks for pointing me to the |
@leoyala, thanks for explaining this new source type. I see it has new capabilities beyond the disk/ring source that already exists. so I am happy to accept your contribution, but I will need you to make some changes. I will leave my comments in the related code in a second, but the first thing I want you to do is to do a if you haven't done rebase before, check out this tutorial: https://www.educative.io/answers/what-are-git-rebase-and-cherry-pick after rebase, the PR should only contain the relevant code changes (changing the least number of lines) to ensure other part of the codes were not impacted. I am also wondering if you can specify which of the parameters are required and what happen if some additional parameters are not specified (thus have a 0 default value)? for example, what if someone set I recommend to name this once you finish the rebase, you should run |
@@ -1 +1,8 @@ | |||
Rakefile | |||
|
|||
# JetBrains configuration files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if these are needed
@@ -93,6 +93,7 @@ | |||
#define MCX_SRC_PENCILARRAY 14 /**< a rectangular array of pencil beams */ | |||
#define MCX_SRC_PATTERN3D 15 /**< a 3D pattern source, starting from srcpos, srcparam1.{x,y,z} define the x/y/z dimensions */ | |||
#define MCX_SRC_HYPERBOLOID_GAUSSIAN 16 /**< Gaussian-beam with spot focus, scrparam1.{x,y,z} define beam waist, distance from source to focus, rayleigh range */ | |||
#define MCX_SRC_RING 17 /**< a ring for laparoscopic illumination */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please consider MCX_SRC_RINGSECTOR
if the angle params are required
@@ -1385,6 +1385,60 @@ __device__ inline int launchnewphoton(MCXpos* p, MCXdir* v, Stokes* s, MCXtime* | |||
canfocus = (gcfg->srctype == MCX_SRC_SLIT); | |||
break; | |||
} | |||
case (MCX_SRC_RING): { | |||
/* ring light source, can be a complete ring or a truncated ring. the expected parameters for definition of the | |||
light source are as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please format this using doxygen formatted comments, like the rest of the code, see example here
https://www.doxygen.nl/manual/docblocks.html
Just listening in and have a minor thought/question, if it would be simpler to maintain/debug/test to have a single angular range, and then simulate the composite ring by superposing two independent solutions. Just a thought! |
float random_angle = rand_uniform01(t); | ||
float angle_range_1[2] = {((TWO_PI / 2.f) - cut_angle_1) / 2.f, ((TWO_PI / 2.f) + cut_angle_1) / 2.f}; | ||
float angle_range_2[2] = {(TWO_PI + (TWO_PI / 2.f) - cut_angle_2) / 2.f, | ||
(TWO_PI + (TWO_PI / 2.f) + cut_angle_2) / 2.f}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep in mind that division is extremely expensive on the GPU - it is even more costly than sin/cos functions, so avoid using divisions at all cost.
for example, always convert a/2.f
to a*0.5f
- as the multiplication takes 1 clock cycle to finish and division takes 15 clock cycles or more to complete.
CUDA even had to use approximated division __fdividef
to speed up the computation
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SINGLE.html#group__CUDA__MATH__INTRINSIC__SINGLE_1gc996beec34f94f6376d0674a6860e107
float minimum_radius = gcfg->srcparam1.x; | ||
float maximum_radius = gcfg->srcparam1.y; | ||
float cut_angle_1 = (TWO_PI / 360.f) * gcfg->srcparam1.z; | ||
float cut_angle_2 = (TWO_PI / 360.f) * gcfg->srcparam1.w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you minimize the use of registers (temporary variables) in the new code? the current biggest bottleneck to mcx's GPU kernel is the number of registers - the total register used by mcx is ranging between 64 to 100+ (for more complex modes such as SVMC).
you can see other source definitions - we tried to minimize the use of temporary variables and compute temporary values on the fly or reuse some existing ones.
this new source alone added 24 new registers, which amounts to nearly half of the registers used by the barebone mcx kernel. I am pretty sure merging this code to the main code will significantly slow down the rest of the functions as the compiler is not smart enough to optimize out these registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that minimize the use of registers will hurt readability, but speed performance is the most important spec of mcx that we have to ensure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me thinking if you can simply merge the angle range to the existing disk/ring source code? likely this will yield the same result but much more compact implementation. you can still call this ring source but just add a new case
statement in this section
Line 1232 in 1a2f291
case (MCX_SRC_DISK): |
|
||
float angle = gcfg->srcparam2.x; //full beam divergence angle measured at Full Width at Half Maximum (FWHM) | ||
float FWHM = 2.f * tanf(0.5f * angle * TWO_PI / 360.f); //FWHM of beam divergence | ||
float sigma = FWHM / (2.f * sqrtf(2.f * logf(2.f))); //standard deviation of gaussian with FWHM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the numerator is a constant and should precompute the reciprocal and convert it to a single multiplication. otherwise, this line is very costly.
@@ -196,7 +196,7 @@ const char boundarydetflag[] = {'0', '1', '\0'}; | |||
|
|||
const char* srctypeid[] = {"pencil", "isotropic", "cone", "gaussian", "planar", | |||
"pattern", "fourier", "arcsine", "disk", "fourierx", "fourierx2d", "zgaussian", | |||
"line", "slit", "pencilarray", "pattern3d", "hyperboloid", "" | |||
"line", "slit", "pencilarray", "pattern3d", "hyperboloid", "ring", "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a few other places where these source type constants are defined, you will have to run cd mcx/src; grep 'zgaussian' -R *
and modify all other units. I believe pmcx.cpp
and mcxlab.cpp
both need to be updated.
glad to hear from you @dcuccia, I am curious what you meant by a single angular range - like specifying the start angle only? |
@leoyala, upon reading the patch, I now see the biggest problem of merging this patch is the extensive use of registers (24) in your code. This will likely slow down other features - if we increase the register count from 64 to say 90, what will happen is that the GPU cores can only run 50% less simultaneous blocks because the register space on the core/SM is fixed. this will translate to 50% slowing down. in the past, we found that CUDA compilers is not smart enough to reuse or remove inefficient registers at the compilation stage, so that's why we had to minimize its use from the beginning. I suggest the best path moving forward is to incorporate your angular sector range parameters with the existing let me kow if you want to take a look at the disk code and see if you can incorporate the angular parameters. |
Hey @fangq! Yes, long-time lurker :) Really impressed with the tools and community you have fostered over the years, and like to stay connected to your latest work. We've gotten some good mileage out of MCX for internal studies, and based on what Raj tells me, he really likes the new Python bindings.
In the discussion above "angle ranges" were discussed. And based on my superficial reading of the thread, was just curious if the design would be a single angle range (start/stop, default [0, 2Pi]), and then a multi-range composite system could be modeled by a superposition of two of these angle range sources by linearly combining/superposing the results. A multi-range source could be implemented by combining single-range sources as well, I suppose. |
so glad to hear! yes, we are excited about the new python binding and would advertise it more.
to be honest, I also did not fully follow the cut-angle calculations. I hope to get some clarifications from @leoyala. I totally agree with you that multiple ring sections should be achieved using a composite/superposition of this type of sources, rather than handling it inside a single source. |
compact implementation of ring source, close #181
Created a new ring light source that is defined by specifying the inner and outer radius of the ring. To allow for a truncated ring geometry, the angular range of two halves of the same ring can be specified, this truncated geometry is common in medical optical devices such as laparoscopes.
Also updated
.gitignore
to ignore common configuration files from IDEs such as CLion, and compiler outputs.