Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky parallel/test-net-write-slow on FreeBSD #7516

Closed
Trott opened this issue Jul 1, 2016 · 15 comments
Closed

Investigate flaky parallel/test-net-write-slow on FreeBSD #7516

Trott opened this issue Jul 1, 2016 · 15 comments
Labels
freebsd Issues and PRs related to the FreeBSD platform. net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests.

Comments

@Trott
Copy link
Member

Trott commented Jul 1, 2016

https://ci.nodejs.org/job/node-test-commit-freebsd/3080/nodes=freebsd10-64/console:

not ok 688 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1506964/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.352
@Trott Trott added net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests. freebsd Issues and PRs related to the FreeBSD platform. labels Jul 1, 2016
@Trott
Copy link
Member Author

Trott commented Jul 3, 2016

And again:

https://ci.nodejs.org/job/node-test-commit-freebsd/3107/nodes=freebsd10-64/console:

not ok 688 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1638036/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.372

@Trott
Copy link
Member Author

Trott commented Jul 3, 2016

Stress test to confirm it's reproducible via stress test: https://ci.nodejs.org/job/node-stress-single-test/789/nodes=freebsd102-64/console

@Trott
Copy link
Member Author

Trott commented Jul 3, 2016

Failures were both on test-digitalocean-freebsd10-x64-2. Stress test (with no failures yet) is also on test-digitalocean-freebsd10-x64-2. ¯_(ツ)_/¯

@Trott
Copy link
Member Author

Trott commented Jul 3, 2016

Other recent failures besides 3080 and 3107:

3089: https://ci.nodejs.org/job/node-test-commit-freebsd/3089/nodes=freebsd10-64/console

not ok 687 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1638036/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.295

3087: https://ci.nodejs.org/job/node-test-commit-freebsd/3087/nodes=freebsd10-64/console

  ...
not ok 687 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1769108/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.530

3085: https://ci.nodejs.org/job/node-test-commit-freebsd/3085/nodes=freebsd10-64/console

not ok 688 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1703572/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.289

3084: https://ci.nodejs.org/job/node-test-commit-freebsd/3084/nodes=freebsd10-64/console

  ...
not ok 688 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1375892/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.413

3083: https://ci.nodejs.org/job/node-test-commit-freebsd/3083/nodes=freebsd10-64/console

not ok 687 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1769108/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.408

@Trott
Copy link
Member Author

Trott commented Jul 3, 2016

Seems significant that flushed (which is supposed to get up to 10) never gets past 1 on the failed runs.

/cc @nodejs/platform-freebsd

@Trott
Copy link
Member Author

Trott commented Jul 4, 2016

Stress test came up clean. I repeat: ¯_(ツ)_/¯

@Trott
Copy link
Member Author

Trott commented Jul 4, 2016

Stress test label for FreeBSD has been updated to match that of node-test-pull-request. I don't think it's any different than the test I already ran, but just in case, and because I'm running out of things to try without spinning up my own FreeBSD instance: https://ci.nodejs.org/job/node-stress-single-test/791/nodes=freebsd10-64/

@Trott
Copy link
Member Author

Trott commented Jul 4, 2016

This test was introduced in 0f0af55a after 0.6.10 and before 0.6.11. It was intended to fix nodejs/node-v0.x-archive#2678.

So it should fail in 0.6.10 and pass in 0.6.11 and later.

However, for me, it does not fail in 0.6.10 in its current form. (You'll need to comment out the require() for common and change the Buffer initialization to the old way (new Buffer() and buf.fill(), so not quite its current form but as close as you can get and still run the test.)

The original form fails or succeeds in both 0.6.x versions depending on how the two timeout values are tweaked.

In fact, the same is true of the current form of the test running Node 6.2.2. On my laptop, if I change 1000 to 430, it will sometimes fail, and sometimes succeed. flushed is always 4 when it fails and I have a 4-processor machine. So that probably means that seeing 1 on FreeBSD CI just means it's a single-processor machine. (/cc @joaocgreis @jbergstroem for confirmation?)

A factor of two is not a lot of head room, in my opinion. At this point, I'm not sure if the thing to try is change 1000 to 9999 and see if the problem goes away, or if the thing to do is to remove the test entirely. Thoughts, @nodejs/testing?

/cc @koichik who wrote the test originally, in case there's additional light to be shed.

@jbergstroem
Copy link
Member

@Trott both freebsd machines have JOBS=2 set (and have 2 vcpus)

@Trott
Copy link
Member Author

Trott commented Jul 5, 2016

Trying to run on all platforms as a stress test to see if any of them fail: https://ci.nodejs.org/job/node-stress-single-test/792/

@Trott
Copy link
Member Author

Trott commented Jul 5, 2016

Still occurring on FreeBSD but still unable to replicate in stress test. They do seem to cluster. (Could the time of day be relevant? Something else running on those machines at a particular time?)

Here it is in 3118. https://ci.nodejs.org/job/node-test-commit-freebsd/3118/nodes=freebsd10-64/console:

not ok 687 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1572500/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.279

Here it is in 3130. https://ci.nodejs.org/job/node-test-commit-freebsd/3130/nodes=freebsd10-64/console

not ok 687 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1638036/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.506

Here it is in 3131. https://ci.nodejs.org/job/node-test-commit-freebsd/3131/nodes=freebsd10-64/console:

not ok 688 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1638036/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.392

3132:

not ok 683 parallel/test-net-write-slow
# 
# assert.js:90
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1572500/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.282

3135:

not ok 686 parallel/test-net-write-slow
# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: flushed: 1, received: 1441428/2000000
#     at Socket.<anonymous> (/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd10-64/test/parallel/test-net-write-slow.js:16:12)
#     at emitNone (events.js:86:13)
#     at Socket.emit (events.js:185:7)
#     at Socket._onTimeout (net.js:333:8)
#     at tryOnTimeout (timers.js:224:11)
#     at Timer.listOnTimeout (timers.js:198:5)
  ---
  duration_ms: 1.329

@Trott
Copy link
Member Author

Trott commented Jul 5, 2016

Trying something else that probably won't pan out but getting desperate: Running five node-test-commit-freebsd jobs (3138 - 3142).

@Trott
Copy link
Member Author

Trott commented Jul 6, 2016

Hey! That worked! 3138 failed! 3139 succeeded. Other three are yet to run...

@Trott
Copy link
Member Author

Trott commented Jul 6, 2016

3 of the 5 runs had this test fail. That's actually potentially useful...

@Trott
Copy link
Member Author

Trott commented Jul 6, 2016

Test runs of node-test-commit-freebsd with the timeout increased tenfold to 9999:

  • 3146 ✅ PASS (1.508s)
  • 3148 ✅ PASS (1.744s)
  • 3151 ✅ PASS (1.826s)
  • 3156 ✅ PASS (1.540s)

Tests run of node-test-commit-freebsd with current master:

  • 3149 ❌ FAIL (1.391s)
  • 3152 ❌ FAIL (1.378s)
  • 3153 ❌ FAIL (1.377s)
  • 3155 ✅ PASS (1.376s)
  • 3158 ❌ FAIL (1.357s)

Good enough for me! PR'ing a change (although I still wonder if this test isn't a candidate for removal).

Trott added a commit to Trott/io.js that referenced this issue Jul 6, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: nodejs#7516
@Trott Trott closed this as completed in 9d654a3 Jul 7, 2016
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
evanlucas pushed a commit that referenced this issue Jul 13, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Increase socket timeout so that there is enough time to reliably run the
test on FreeBSD.

Fixes: #7516
PR-URL: #7555
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
freebsd Issues and PRs related to the FreeBSD platform. net Issues and PRs related to the net subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

2 participants