Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable asynchronous output for the ECL output test #303

Merged
merged 1 commit into from
Mar 21, 2018
Merged

disable asynchronous output for the ECL output test #303

merged 1 commit into from
Mar 21, 2018

Conversation

andlaus
Copy link
Contributor

@andlaus andlaus commented Mar 21, 2018

this lead to non-deterministic crashes deep inside libecl. My cursory hypotheses are that this test makes the assumption that the output is written synchronously (it tries to read back the results from disk immediately) and/or that libecl is not threadsafe.

this leads to crashes deeply inside libecl. My cursory hypotheses are
that this test makes the assumption that the output is written
synchronously (it tries to read back the results from disk
immediately) and/or that libecl is not threadsafe.
@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

jenkins build this please

@joakim-hove
Copy link
Member

it tries to read back the results from disk immediately

Well - that is certainly the case yes.

and/or that libecl is not threadsafe.

The library itself has no global state - so it is possible make thread safe applications based on it, but of course the filesystem is a form of global state.

@atgeirr
Copy link
Member

atgeirr commented Mar 21, 2018

I think async output behaviour is quite important to test, and here we have been given a crashing case for free...! I would rather like to see the test deal properly with async output, I guess it should not be too hard given the new Tasklet framework? Or would it require adding something there?

@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

well, the test did not fail deterministically. depending on the system you ran it on, it failed maybe once out of 5 times. I suppose that's part of the fun with threads.

the problem is that the code that checks that the written results are correct is in the main thread immediately after the writing tasklet has been dispatched. I am thus pretty sure that it is not worthwhile to jump through hoops for this test (e.g. a call to barrier() can be added which makes it quasi-synchronous again, or the comparison can be done in a tasklet), but I'm open if you'll refactor it.

@atgeirr
Copy link
Member

atgeirr commented Mar 21, 2018

well, the test did not fail deterministically. depending on the system you ran it on, it failed maybe once out of 5 times. I suppose that's part of the fun with threads.

If you add a one-second delay I guess you could get it to fail every time?

I am thus pretty sure that it is not worthwhile to jump through hoops for this test

On the contrary, this is the best test you could get to prove that the tasklets are up to the ... uh... task (let).

@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

If you add a one-second delay I guess you could get it to fail every time?

there is no guarantee for that. also, I'm pretty sure that there would be quite a few complaints if writing output would be delayed by 1 second for each timestep.

On the contrary, this is the best test you could get to prove that the tasklets are up to the ... uh... task (let).

a dedicated test for tasklets would be a better idea as it could e.g. test the case with more than one worker thread. do you want do implement one? This would probably also be a good reason to get familiar with the code.

@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

anyway, I think this PR should be merged even if you just consider it a band-aid. Mind to press green?

@joakim-hove
Copy link
Member

If you add a one-second delay I guess you could get it to fail every time?

I have not seen the test in question - but I would assume the correct ™ approach was to have some join() like call before reading and verifying the generated result file?

@atgeirr
Copy link
Member

atgeirr commented Mar 21, 2018

I'm pretty sure that there would be quite a few complaints if writing output would be delayed by 1 second for each timestep.

Sure, it was just to get high failure rates while improving to code. I did not intend it as a permanent feature!

a dedicated test for tasklets would be a better idea as it could e.g. test the case with more than one worker thread. do you want do implement one? This would probably also be a good reason to get familiar with the code.

Is there no such test already? Now I got nervous... I do not think I have time to dedicate to this now.

anyway, I think this PR should be merged even if you just consider it a band-aid. Mind to press green?

We agree it's a band-aid, and I'll merge.

@atgeirr atgeirr merged commit 8650fb6 into OPM:master Mar 21, 2018
@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

I have not seen the test in question - but I would assume the correct ™ approach was to have some join() like call before reading and verifying the generated result file?

either that (this would imply deleting the simulator object) or triggering a barrier on the tasklet runner...

@andlaus
Copy link
Contributor Author

andlaus commented Mar 21, 2018

I'm pretty sure that there would be quite a few complaints if writing output would be delayed by 1 second for each timestep.

Sure, it was just to get high failure rates while improving to code. I did not intend it as a permanent feature!

I forgot to mention that I did exactly what you proposed before opening the tasklet PRs (#299 and #301). it was not a permanent feature, though ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants