Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests with Julia 1.5 instead of Julia 1.6 #1568

Merged
merged 2 commits into from
Apr 6, 2021
Merged

Run tests with Julia 1.5 instead of Julia 1.6 #1568

merged 2 commits into from
Apr 6, 2021

Conversation

devmotion
Copy link
Member

Since Turing (or rather Libtask_jll) is not compatible with Julia 1.6 yet (see e.g. #1554 and the explanations there).

I guess it would be good to open an issue that reminds us to revert this change once a compatible version is available.

@torfjelde
Copy link
Member

Do you understand why the tests here are failing on 1.5? Seems like it's still complaining about segmentation fault (Signal 11 = SIGSEGV)?

@devmotion
Copy link
Member Author

No, I don't know why it segfaults. I reran the tests, so it seems to be reproducible.

@torfjelde
Copy link
Member

Have you tried running locally? Trying now myself.

@torfjelde torfjelde mentioned this pull request Apr 6, 2021
@devmotion
Copy link
Member Author

Yes, it segfaults. I try 1.5.3 now. Currently, I assume it is caused by the multithreading tests.

@devmotion
Copy link
Member Author

1.5.3 seems to work, at least it passed the point were Julia 1.5.4 failed.

@torfjelde
Copy link
Member

Ah, yeah I'm running 1.5.3 locally 😕

Isn't this a bit strange though? If it's something with the multithreading, shouldn't this have been caught in DPPL tests?

@devmotion
Copy link
Member Author

I don't know which of the changes in 1.5.4 causes the segfaults: JuliaLang/julia@v1.5.3...v1.5.4

@torfjelde
Copy link
Member

I think this is the offending line btw: https://github.com/TuringLang/Turing.jl/blob/master/test/inference/Inference.jl#L106

At least the one where it errors. Basing it off the _varinfo warning showing up just before the error, which should be the _varinfo from "Contexts" testset.

@devmotion
Copy link
Member Author

Just wanted to change this line 😄

I don't think we ever tested DynamicPPL on Julia 1.5.4. I assume we only checked 1.5.3 and 1.6.0 as soon as it became available.

@torfjelde
Copy link
Member

Just wanted to change this line

Haha 😄
But this is weird. Just running that line we works without issues on my end, even on 1.5.4. Trying to run the tests though, to see if the result is different.

I don't think we ever tested DynamicPPL on Julia 1.5.4. I assume we only checked 1.5.3 and 1.6.0 as soon as it became available.

Ah, that makes sense.

@devmotion
Copy link
Member Author

It's the first multi-threaded testset in Inference.jl. As soon as I removed it, I could run the tests successfully on Julia 1.5.4. I wonder if it's a specific algorithm in this testset that is problematic or if it is a general problem.

@torfjelde
Copy link
Member

You're referring to the part where it iterates through the samplers? Do you know which algorithm it fails for?

@devmotion
Copy link
Member Author

Yes, exactly, in the part where it loops through the samplers. I added print statements at the top of the for loop and it seems PG is the (first) sampler that fails. HMC worked fine it seems.

@devmotion
Copy link
Member Author

Gibbs with PG segfaults as well. So it seems only PG is problematic. I wonder if multithreading with Libtask is broken on Julia 1.5.4.

@KDr2 Is there any change in JuliaLang/julia@v1.5.3...v1.5.4 that would explain problems with Libtask on Julia 1.5.4?

@devmotion
Copy link
Member Author

Hmm maybe it's fine to disable these multithreading tests for PG? Standard sampling seems to work, and particle methods are somewhat broken anyway since they don't pass around the RNG (see e.g. TuringLang/AdvancedPS.jl#20 (comment)) - maybe that's the reason for the segfault?

@torfjelde
Copy link
Member

Hmm maybe it's fine to disable these multithreading tests for PG? Standard sampling seems to work, and particle methods are somewhat broken anyway since they don't pass around the RNG (see e.g. TuringLang/AdvancedPS.jl#20 (comment)) - maybe that's the reason for the segfault?

I mean, I'm fine with it. Feels like something @yebai should maybe comment on?

For some context: tests are failiing due to PG methods, which means that there are other bugfixes that we can't get through.

@devmotion
Copy link
Member Author

I just want to stress that in general particle methods are fine - it's just some problem related to multithreading. And I don't know what change in Julia 1.5.4 could have caused it - maybe it's something Libtask-specific? Maybe related to the RNG issues of particle methods?

@torfjelde
Copy link
Member

Let's just drop the test for now and create an issue. @yebai said he was also fine with it so don't think anyone is opposed to it.

@devmotion devmotion requested a review from torfjelde April 6, 2021 21:10
Copy link
Member

@torfjelde torfjelde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@devmotion devmotion merged commit bfd836d into master Apr 6, 2021
@devmotion devmotion deleted the dw/julia15 branch April 6, 2021 21:16
@KDr2
Copy link
Member

KDr2 commented Apr 8, 2021

Gibbs with PG segfaults as well. So it seems only PG is problematic. I wonder if multithreading with Libtask is broken on Julia 1.5.4.

@KDr2 Is there any change in JuliaLang/julia@v1.5.3...v1.5.4 that would explain problems with Libtask on Julia 1.5.4?

This is caused by a change of struct jl_task_t, a field was removed in 1.5.4:

@@ -1751,8 +1751,6 @@ typedef struct _jl_task_t {
     jl_gcframe_t *gcstack;
     // saved exception stack
     jl_excstack_t *excstack;
-    // current world age
-    size_t world_age;

     // id of owning thread
     // does not need to be defined until the task runs

So the memory layout of jl_task_t in v1.5.4 is different from v1.5.3.

We should build another jll package for [1.5.4, 1.6.0) to fix this, a PR against Yggdrasil should be made, like the one for v1.6.0 (https://github.com/JuliaPackaging/Yggdrasil/pull/2176), but no code need to be changed except the julia version.

@KDr2
Copy link
Member

KDr2 commented Apr 8, 2021

Status Quo:

libtask_jll version compat Julia Version
0.4.2 1.5
0.4.3 1.6

If we file a new PR for 1.5.4, it will be:

libtask_jll version compat Julia Version
0.4.2 <1.5.4
0.4.4 [1.5.4, 1.6.0)
0.4.3 1.6

Is it rational and feasible? How about filing two PR to make this:

libtask_jll version compat Julia Version PR
0.4.2 <1.5.4 N
0.4.4 [1.5.4, 1.6.0) Y
0.4.5 1.6 Y

@devmotion
Copy link
Member Author

IMO it is fine to not have a perfectly ordered versioning scheme in this case but maybe we could just ask in the PR for 1.5.4 what's commonly done? I am sure others had same/similar problems before.

@KDr2
Copy link
Member

KDr2 commented Apr 8, 2021

I found that Julia 1.5.4 is not available on Yggdrasil...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants