Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Out_channel shrinking runs cause timeout #378

Closed
jmid opened this issue Jul 14, 2023 · 2 comments · Fixed by #431
Closed

Long Out_channel shrinking runs cause timeout #378

jmid opened this issue Jul 14, 2023 · 2 comments · Fixed by #431
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime test suite reliability Issue concerns tests that should behave more predictably

Comments

@jmid
Copy link
Collaborator

jmid commented Jul 14, 2023

Adding the missing bytes-involving bindings introduced a sister-issue to #373 albeit for Out_channel.
This was observed by @shym in #373 (comment)
I've also observed it on #376:

For Cygwin trunk/5.2 part1
https://github.com/ocaml-multicore/multicoretests/actions/runs/5553343778/jobs/10141775211

random seed: 186839348
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[ ]   18    0    0   18 / 1000    62.9s Lin DSL In_channel test with Domain
[✓]   20    0    1   19 / 1000   107.3s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 5000     0.0s Lin DSL Out_channel test with Domain
[ ]    4    0    0    4 / 5000    17.7s Lin DSL Out_channel test with Domain
[ ]   17    0    0   17 / 5000    89.4s Lin DSL Out_channel test with Domain
[ ]   30    0    0   30 / 5000   152.8s Lin DSL Out_channel test with Domain
[ ]   37    0    0   37 / 5000   213.0s Lin DSL Out_channel test with Domain (shrinking:   11)
[ ]   37    0    0   37 / 5000   275.9s Lin DSL Out_channel test with Domain (shrinking:   15.0005)
[ ]   37    0    0   37 / 5000   341.2s Lin DSL Out_channel test with Domain (shrinking:   18.0004)
[ ]   37    0    0   37 / 5000   407.6s Lin DSL Out_channel test with Domain (shrinking:   20.0004)
[ ]   37    0    0   37 / 5000   469.6s Lin DSL Out_channel test with Domain (shrinking:   21.0007)
[ ]   37    0    0   37 / 5000   533.5s Lin DSL Out_channel test with Domain (shrinking:   22.0007)
[ ]   37    0    0   37 / 5000   598.2s Lin DSL Out_channel test with Domain (shrinking:   23.0007)
[ ]   37    0    0   37 / 5000   663.0s Lin DSL Out_channel test with Domain (shrinking:   24.0006)
...
[ ]   37    0    0   37 / 5000  3898.2s Lin DSL Out_channel test with Domain (shrinking:   73.0009)
[ ]   37    0    0   37 / 5000  3963.1s Lin DSL Out_channel test with Domain (shrinking:   74.0008)
[ ]   37    0    0   37 / 5000  4023.3s Lin DSL Out_channel test with Domain (shrinking:   75.0007)
[ ]   37    0    0   37 / 5000  4085.3s Lin DSL Out_channel test with Domain (shrinking:   76.0007)
[ ]   37    0    0   37 / 5000  4147.7s Lin DSL Out_channel test with Domain (shrinking:   77.0007)
[ ]   37    0    0   37 / 5000  4210.4s Lin DSL Out_channel test with Domain (shrinking:   78.0007)
[ ]   37    0    0   37 / 5000  4276.4s Lin DSL Out_channel test with Domain (shrinking:   79.0006)
[ ]   37    0    0   37 / 5000  4339.0s Lin DSL Out_channel test with Domain (shrinking:   80.0005)
[ ]   37    0    0   37 / 5000  4399.3s Lin DSL Out_channel test with Domain (shrinking:   81.0004)
[ ]   37    0    0   37 / 5000  4466.2s Lin DSL Out_channel test with Domain (shrinking:   82.0004)
[ ]   37    0    0   37 / 5000  4528.5s Lin DSL Out_channel test with Domain (shrinking:   83.0003)
Error: The operation was canceled.

For Windows MingW 5.1
https://github.com/ocaml-multicore/multicoretests/actions/runs/5553343781/jobs/10141774703

random seed: 163892320
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[ ]    1    0    0    1 / 1000    63.0s Lin DSL In_channel test with Domain (shrinking:   29.0006)
[✓]    2    0    1    1 / 1000   108.9s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 5000     0.0s Lin DSL Out_channel test with Domain
[ ]   11    0    0   11 / 5000    21.4s Lin DSL Out_channel test with Domain
[ ]   11    0    0   11 / 5000    86.6s Lin DSL Out_channel test with Domain (shrinking:    3.0004)
[ ]   11    0    0   11 / 5000   148.9s Lin DSL Out_channel test with Domain (shrinking:    5.0009)
[ ]   11    0    0   11 / 5000   212.4s Lin DSL Out_channel test with Domain (shrinking:    7.0010)
[ ]   11    0    0   11 / 5000   273.5s Lin DSL Out_channel test with Domain (shrinking:    8.0005)
[ ]   11    0    0   11 / 5000   336.8s Lin DSL Out_channel test with Domain (shrinking:    9.0011)
[ ]   11    0    0   11 / 5000   401.0s Lin DSL Out_channel test with Domain (shrinking:   12.0007)
....
[ ]   11    0    0   11 / 5000 10347.9s Lin DSL Out_channel test with Domain (shrinking:   87.0033)
[ ]   11    0    0   11 / 5000 10409.4s Lin DSL Out_channel test with Domain (shrinking:   88.0009)
[ ]   11    0    0   11 / 5000 10472.8s Lin DSL Out_channel test with Domain (shrinking:   88.0014)
[ ]   11    0    0   11 / 5000 10532.9s Lin DSL Out_channel test with Domain (shrinking:   89.0007)
[ ]   11    0    0   11 / 5000 10601.2s Lin DSL Out_channel test with Domain (shrinking:   89.0014)
[ ]   11    0    0   11 / 5000 10670.7s Lin DSL Out_channel test with Domain (shrinking:   90.0009)
[ ]   11    0    0   11 / 5000 10735.0s Lin DSL Out_channel test with Domain (shrinking:   90.0016)
[ ]   11    0    0   11 / 5000 10797.9s Lin DSL Out_channel test with Domain (shrinking:   91.0013)
[ ]   11    0    0   11 / 5000 10862.2s Lin DSL Out_channel test with Domain (shrinking:   91.0020)
[ ]   11    0    0   11 / 5000 10928.2s Lin DSL Out_channel test with Domain (shrinking:   91.0028)
[ ]   11    0    0   11 / 5000 10994.4s Lin DSL Out_channel test with Domain (shrinking:   92.0012)
[ ]   11    0    0   11 / 5000 11057.5s Lin DSL Out_channel test with Domain (shrinking:   93.0009)
[ ]   11    0    0   11 / 5000 11117.7s Lin DSL Out_channel test with Domain (shrinking:   93.0015)
[ ]   11    0    0   11 / 5000 11180.9s Lin DSL Out_channel test with Domain (shrinking:   94.0003)
[ ]   11    0    0   11 / 5000 11242.1s Lin DSL Out_channel test with Domain (shrinking:  102.0003)
[ ]   11    0    0   11 / 5000 11302.1s Lin DSL Out_channel test with Domain (shrinking:  108)
[✓]   12    0    1   11 / 5000 11331.5s Lin DSL Out_channel test with Domain

....
                                                              |                          
                                     Out_channel.output_substring t "bceY\145CQx\156" 1 6
                                                              |                          
                                   .-----------------------------------------------------.
                                   |                                                     |                          
                       Out_channel.close_noerr t                                Out_channel.length t                


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test Lin DSL Out_channel test with Domain:

  Results incompatible with sequential execution

                                                                   |                               
                                     Out_channel.output_substring t "bceY\145CQx\156" 1 6 : Ok (())
                                                                   |                               
                                   .---------------------------------------------------------------.
                                   |                                                               |                               
                     Out_channel.close_noerr t : ()                                  Out_channel.length t : Ok (6)                 

================================================================================
success (ran 2 tests)
@jmid jmid added the test suite reliability Issue concerns tests that should behave more predictably label Jul 14, 2023
@jmid
Copy link
Collaborator Author

jmid commented Jul 15, 2023

Merging #376 into main triggered this - on Windows bytecode trunk/5.2:
https://github.com/ocaml-multicore/multicoretests/actions/runs/5558724000/jobs/10154146152

random seed: 519895651
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[✓]    2    0    1    1 / 1000    17.6s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 5000     0.0s Lin DSL Out_channel test with Domain
[ ]   25    0    0   25 / 5000    43.8s Lin DSL Out_channel test with Domain
[ ]   39    0    0   39 / 5000   407.2s Lin DSL Out_channel test with Domain
[ ]   64    0    0   64 / 5000   475.1s Lin DSL Out_channel test with Domain
[ ]   90    0    0   90 / 5000   541.0s Lin DSL Out_channel test with Domain
[ ]  114    0    0  114 / 5000   601.7s Lin DSL Out_channel test with Domain
[ ]  125    0    0  125 / 5000   691.6s Lin DSL Out_channel test with Domain (shrinking:    0.0003)
[ ]  125    0    0  125 / 5000   765.6s Lin DSL Out_channel test with Domain (shrinking:    0.0005)
[ ]  125    0    0  125 / 5000   830.0s Lin DSL Out_channel test with Domain (shrinking:    0.0008)
[ ]  125    0    0  125 / 5000   892.9s Lin DSL Out_channel test with Domain (shrinking:    0.0010)
[ ]  125    0    0  125 / 5000   956.3s Lin DSL Out_channel test with Domain (shrinking:    0.0014)
[ ]  125    0    0  125 / 5000  1019.9s Lin DSL Out_channel test with Domain (shrinking:    0.0016)
[ ]  125    0    0  125 / 5000  1086.4s Lin DSL Out_channel test with Domain (shrinking:    0.0019)
[ ]  125    0    0  125 / 5000  1168.6s Lin DSL Out_channel test with Domain (shrinking:    0.0022)
[ ]  125    0    0  125 / 5000  1257.4s Lin DSL Out_channel test with Domain (shrinking:    0.0025)
[ ]  125    0    0  125 / 5000  1342.9s Lin DSL Out_channel test with Domain (shrinking:    0.0027)
[ ]  125    0    0  125 / 5000  1418.2s Lin DSL Out_channel test with Domain (shrinking:    0.0029)
[ ]  125    0    0  125 / 5000  1494.0s Lin DSL Out_channel test with Domain (shrinking:    0.0031)
[ ]  125    0    0  125 / 5000  1571.8s Lin DSL Out_channel test with Domain (shrinking:    0.0033)
...
[ ]  125    0    0  125 / 5000  7034.0s Lin DSL Out_channel test with Domain (shrinking:   13.0057)
[ ]  125    0    0  125 / 5000  7098.6s Lin DSL Out_channel test with Domain (shrinking:   14.0003)
[ ]  125    0    0  125 / 5000  7166.9s Lin DSL Out_channel test with Domain (shrinking:   14.0008)
[ ]  125    0    0  125 / 5000  7238.2s Lin DSL Out_channel test with Domain (shrinking:   14.0014)
[ ]  125    0    0  125 / 5000  7301.6s Lin DSL Out_channel test with Domain (shrinking:   14.0018)
[ ]  125    0    0  125 / 5000  7383.7s Lin DSL Out_channel test with Domain (shrinking:   14.0021)
[ ]  125    0    0  125 / 5000  7464.1s Lin DSL Out_channel test with Domain (shrinking:   14.0024)
[ ]  125    0    0  125 / 5000  7539.0s Lin DSL Out_channel test with Domain (shrinking:   14.0027)
[ ]  125    0    0  125 / 5000  7627.5s Lin DSL Out_channel test with Domain (shrinking:   14.0030)
Fatal error: exception User interruption
[ ]  125    0    0  125 / 5000  7690.1s Lin DSL Out_channel test with Domain (shrinking:   16.0002)Terminate batch job (Y/N)? 
^C
Error: The operation was canceled.

@jmid jmid mentioned this issue Nov 9, 2023
@jmid jmid added the ocaml5-issue A potential issue in the OCaml5 compiler/runtime label Nov 23, 2023
@jmid
Copy link
Collaborator Author

jmid commented Nov 23, 2023

I'm adding the ocaml5-issue label since this seems be caused by a GC performance regression in 5.1.
In fact, we have never observed this on 5.0 - only on 5.1 and trunk.

Locally my previous perf top testing further shows that the caml_alloc_custom_mem PR ocaml/ocaml#12318 improves GC times reported by perf top to something like the following:

8.98%  lin_tests_dsl_domain.exe  [.] pool_sweep
1.75%  lin_tests_dsl_domain.exe  [.] do_some_marking

which is a significant improvement over pre-12318 (and pre-#387) observations:

51.46%  lin_tests_dsl_domain.exe            [.] pool_sweep
12.13%  lin_tests_dsl_domain.exe            [.] do_some_marking

With 12318 merged into OCaml trunk I don't expect us to see this issue triggered on trunk anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime test suite reliability Issue concerns tests that should behave more predictably
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant