Memory access errors #22

jsenellart · 2022-05-09T12:03:58Z

When running the test cases on ARM - I have found several memory access errors using clang AddressSanitizer that do not seem to be related at all to the python 3 port - maybe the ARM compilation being more strict and revealing them. Technically I am using clang AddressSanitizer to find these problems.

this PR fixes all these issues mainly due to missing boundary checks, but also deprecated numpy interface.

There is a little bit of logic change in patterson_z_n but it seems to be working well.

There is one location where I am a bit puzzled:
https://github.com/demisjohn/CAMFR/pull/22/files#diff-310bd0d7c8ef36392b1780e6ea45c0b716feece1deb267c7e0521f3770a45a69R86-R89

I fixed the actual boundary check - but wonder about the idx calculation (it is as it was before).

now - all the tests are running without any crash - and the boundary checks added have necessarily improved the determinism of the results :).

One single unit test is failing now:

======================================================================
FAIL: testbackward (backward.backward)
backward
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/senellart/DEV/3rdParty/CAMFR/testsuite/backward.py", line 48, in testbackward
    self.assertTrue(R_pass and T_pass)
AssertionError: False is not true

but I don't have a clue what is wrong here - need help from an expert.

To close on the py35_compat branch and completely merge the code, I think this test unit should be fixed, and I can clean-up a bit more on my side the setup.py which is using deprecated python.

demisjohn · 2022-05-14T15:33:43Z

Hi @jsenellart ,
Thanks for taking the time to work on this - Cpp memory errors being something I would never even start to work on, despite them being major problems in the CAMFR code (I did used to see the mode solvers crash in the past with SegFaults, fairly regularly, and never attempted to fix).
So it's possible this is the cause for crashes regardless of ARM or other processors.

Regarding unittest fail, here's what I see:
in testsuite/backward.py

A 1-D "Circular" optical waveguide (ie. like an optical fiber but no "length") is generated, and 20 waveguide modes calculated.
Then the Reflection & Transmission coefficients (complex fraction of power that is R and T'd) is calc'd,
and this is compared to a hard-coded value for those R/T coefficients (R_OK and T_OK).

L36:

R = s.R12(0,0)
R_OK = -0.0392923220796+0.0408718742985j
print R, "expected", R_OK
R_pass = abs((R - R_OK) / R_OK) < eps.testing_eps

T = s.T12(0,0)
T_OK = 0.202336029811+0.776634435067j
print T, "expected", T_OK
T_pass = abs((T - T_OK) / T_OK) < eps.testing_eps

And you can see there's some slack in the comparison using eps.testing_eps (I assume thats something like 1e-5?)

To investigate further,

print out the values of
R & T
print out the vlaues of R_pass and T_pass
find out value of eps.testing_eps so we know what error is considered acceptable.
and we can see how far off the mode-solving calculation is from passing the accuracy check. If it's close, maybe we just increase the error tolerance?
What is the order of unittests - would be useful to know what tests it passed, and what tests were never run because this failed (for example, would ALL modesolver tests actually fail, but this one just came first?).

This is a critical check, as the accuracy of these 1-D modesolves affects every other complex calculation one may do with CAMFR.

––––––
Regarding the new boundary checks:
Unfortunately I'm not very familiar with the detailed calculus implementations in this module. These low-level linear algebra funcs are surely used in the EME mode calcs, but I can't really picture exactly what values would be passed to them when they go out of bounds, nor how much they would change the final mode calc.

Some things to investigate here:

What are the values of 2*N-1 vs. G.size()?
Since N is constructed from G.size() in the first place, why does the original loop logic go beyond the array?
G & N came from allroots.cpp, L60
unsigned int N = (unsigned int)(ceil(G.size() / 2.0));
vector<Complex> G = integrals / 2. / pi / I;
Where does I (above) come from? Could the original out-of-index error originate there?
is c getting fully filled with values during (L95):
c = solve_sym(A,B);
or is the new boundary condition on B causing this to now return an unset value somewhere in the array?

I hope that provides some useful leads.

demisjohn · 2022-05-14T22:18:28Z

@jsenellart I've added you as a collaborator to this repo - please feel free to merge pull requests as you deem fit.

I'm happy to comment and help where I can, won't be contributing much code unless it's in Python - I'm an engineer (photonics and semiconductor mfg.) with only end-goal type of programming interests, so can't delve nearly as deep as you can into Cpp and such! I do have a vested interest in improving CAMFR though, and would be thrilled if we can get this Py3x compatible.

jsenellart · 2022-05-16T08:36:16Z

And you can see there's some slack in the comparison using eps.testing_eps (I assume thats something like 1e-5?)

To investigate further,

print out the values of
R & T

print out the vlaues of R_pass and T_pass

find out value of eps.testing_eps so we know what error is considered acceptable.
and we can see how far off the mode-solving calculation is from passing the accuracy check. If it's close, maybe we just increase the error tolerance?

What is the order of unittests - would be useful to know what tests it passed, and what tests were never run because this failed (for example, would ALL modesolver tests actually fail, but this one just came first?).

This is a critical check, as the accuracy of these 1-D modesolves affects every other complex calculation one may do with CAMFR.

Thanks for the explanation for the fail test, I will compile on another OS to check if it is a portability issue, and will try also to understand where it goes wrong.

@jsenellart I've added you as a collaborator to this repo - please feel free to merge pull requests as you deem fit.

I'm happy to comment and help where I can, won't be contributing much code unless it's in Python - I'm an engineer (photonics and semiconductor mfg.) with only end-goal type of programming interests, so can't delve nearly as deep as you can into Cpp and such! I do have a vested interest in improving CAMFR though, and would be thrilled if we can get this Py3x compatible.

Thanks for your trust, I have no problem helping out on the programming issues and as soon as this remaining issue is handled, will look at the other open issues and help for the packaging.

Cpp memory errors being something I would never even start to work on, despite them being major problems in the CAMFR code (I did used to see the mode solvers crash in the past with SegFaults, fairly regularly, and never attempted to fix).

Regarding crashes, I handled all of the problems triggered in the tests - if you have any other one that you can randomly reproduce, I will be able to have a look too !

demisjohn · 2022-05-31T15:27:00Z

Issue #23 shows a memory error and possible trigger. Any idea if that is related to these fixes?

kitchenknif · 2022-07-26T06:10:20Z

print out the values of
R & T

print out the vlaues of R_pass and T_pass

find out value of eps.testing_eps so we know what error is considered acceptable.
and we can see how far off the mode-solving calculation is from passing the accuracy check. If it's close, maybe we just increase the error tolerance?

What is the order of unittests - would be useful to know what tests it passed, and what tests were never run because this failed (for example, would ALL modesolver tests actually fail, but this one just came first?).

This is a critical check, as the accuracy of these 1-D modesolves affects every other complex calculation one may do with CAMFR.

R: (-0.03929232207960088+0.04087187429849853j) expected (-0.0392923220796+0.0408718742985j)
T: (-0.20233602981136606-0.7766344350673391j) expected (0.202336029811+0.776634435067j)
As per the values above R_pass is True and T_pass is false
testing_eps = 1e-4, but the error is "*-1"

I'm also getting failures on
backward, metal_splitter, rods, substacks
but only if I change the filenames to the standard filenames expected by python unittest - "test_*.py" and run them using
"python3 -m unittest discover". On their own, all the test continue passing, which I think is weird.

There are also two tests that are disabled - PhC_splitter, which passes and stack2, which fails.

So, on their own, all tests except stack2 and backward pass, but when run together with the standard unittest toolkit, more of them seem to fail.

demisjohn · 2023-03-13T03:47:58Z

@jsenellart
Looking at this again - I notice that the erroneous T value is exactly 180° out of phase (sign flipped on real/imaginary parts of T).
So this really does change the physical electromagnetic answer generated. Phase of the transmission matrix may be a critical parameter. So I'm not sure how your boundary additions caused that.

Maybe other ways to remedy the memory leaks, by finding out where they occurred in the first place. Here were my suggestion on that tack:

What are the values of 2*N-1 vs. G.size()?
Since N is constructed from G.size() in the first place, why does the original loop logic go beyond the array?
-- G & N came from allroots.cpp, L60
-- unsigned int N = (unsigned int)(ceil(G.size() / 2.0));
-- vector<Complex> G = integrals / 2. / pi / I;
Where does I (above) come from? Could the original out-of-index error originate there?
is c getting fully filled with values during (L95):
-- c = solve_sym(A,B);
-- or is the new boundary condition on B causing this to now return an unset value somewhere in the array?

jsenellart · 2023-03-14T12:48:26Z

Hello @demisjohn, thanks for the pointers, I actually came back to the code base recently to fix few issues, and add the support of Python 3.11 that was not working, before commit I will have another review pass based on your comments and the observation from @kitchenknif. I will be back to this soon (normally before end of March).

jsenellart added 6 commits May 9, 2022 13:37

identified 2 source of memory access errors

2649063

fix memory access problem

6b60193

add missing boundary checks

a9083af

numpy API to create array changed

22b329d

fix missing boundary check

f5f3ecb

fix compilation warnings

189c9a9

jsenellart changed the title ~~[Work in Progress] Memory access errors~~ Memory access errors May 9, 2022

jsenellart mentioned this pull request May 9, 2022

Compile for Python 3.x #8

Open

jsenellart added 2 commits May 9, 2022 22:37

updated macos installation instructions

e374942

check on clean environment and updated installation procedure

520c1c6

fix demisjohn#23: object keeps a reference to local variable

e340dcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory access errors #22

Memory access errors #22

jsenellart commented May 9, 2022 •

edited

Loading

demisjohn commented May 14, 2022 •

edited

Loading

demisjohn commented May 14, 2022

jsenellart commented May 16, 2022

demisjohn commented May 31, 2022

kitchenknif commented Jul 26, 2022 •

edited

Loading

demisjohn commented Mar 13, 2023

jsenellart commented Mar 14, 2023

Memory access errors #22

Are you sure you want to change the base?

Memory access errors #22

Conversation

jsenellart commented May 9, 2022 • edited Loading

demisjohn commented May 14, 2022 • edited Loading

demisjohn commented May 14, 2022

jsenellart commented May 16, 2022

demisjohn commented May 31, 2022

kitchenknif commented Jul 26, 2022 • edited Loading

demisjohn commented Mar 13, 2023

jsenellart commented Mar 14, 2023

jsenellart commented May 9, 2022 •

edited

Loading

demisjohn commented May 14, 2022 •

edited

Loading

kitchenknif commented Jul 26, 2022 •

edited

Loading