Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test client_multiple_requests_disconnects_handled is failed #842

Open
hyperxor opened this issue Nov 24, 2020 · 23 comments · Fixed by #845, #882, #859 or #901
Open

Unit test client_multiple_requests_disconnects_handled is failed #842

hyperxor opened this issue Nov 24, 2020 · 23 comments · Fixed by #845, #882, #859 or #901

Comments

@hyperxor
Copy link
Contributor

hyperxor commented Nov 24, 2020

Hello,

I have run unit tests on the latest souce and got the following error:

12: [ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
12: Trying to run server...
12: INITING
12: Server address: localhost:38919
12: [server] Sent: 127.0.0.1
12: [server] Sent: 127.0.0.1
12: [server] Sent: 127.0.0.1
12: Response code is OK
12: Response code is OK
12: Response code is OK
12: resolves: 3, rejects: 0, timeout: 1, wait: 6
12: /home/hyperxor/code/pistache/tests/http_server_test.cc:475: Failure
12: Value of: CLIENT_REQUEST_SIZE
12:   Actual: 3
12: Expected: counter
12: Which is: 0
12: [  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (1003 ms)

It seems it happened after this #838 pull request.

@hyperxor
Copy link
Contributor Author

Hello, @iroddis, can you describe the expected scenario of the client_multiple_requests_disconnects_handled unit test, maybe I can fix it.

@dennisjenkins75
Copy link
Collaborator

I cannot reproduce the failure on my dev box (From right after PR #840 is merged). I built with g++ and clang++, both in "Release" mode and "Debug" mode, and ran all unit tests (all pass), plus my own app runs fine.

All 4 builds are very noisy with compiler warnings (I'd love for those to be cleaned up some day), but all 4 "worked".

hyperxor@, what is your build platform?

$ uname -a
Linux ostara 5.4.72-gentoo-x86_64 #1 SMP PREEMPT Mon Nov 9 22:45:29 PST 2020 x86_64 Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz GenuineIntel GNU/Linux

$ g++ --version
g++ (Gentoo 10.2.0-r3 p4) 10.2.0

$ clang++ --version
clang version 11.0.0

@hyperxor
Copy link
Contributor Author

@dennisjenkins75 Ok, thank you, I will see.

$ uname -a
Linux hyperxor-ws 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ g++ --version
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

@kiplingw
Copy link
Member

@dennisjenkins75, try running via autopkgtest. There's notes on how to do this in the readme. It will build and run all unit and CI tests within a container so there are hopefully no side effects spilling in from your system affecting the results.

@hyperxor
Copy link
Contributor Author

hyperxor commented Nov 25, 2020

@dennisjenkins75 @kiplingw I have found tbe same problem with tests here - arm64 build of pistache 0.0.002-202011242302~ubuntu20.04.1 in ubuntu focal RELEASE (link):

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
INITING
Server address: localhost:43605
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
[server] Sent: 127.0.0.1
Response code is OK
Response code is OK
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
/<<PKGBUILDDIR>>/tests/http_server_test.cc:475: Failure
Expected equality of these values:
  counter
    Which is: 0
  CLIENT_REQUEST_SIZE
    Which is: 3
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (1101 ms)
[----------] 8 tests from http_server_test (33791 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (33852 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST

@kiplingw
Copy link
Member

Thanks @hyperxor. If you join the Launchpad team you'll get notified automatically when builds fail.

hyperxor added a commit to hyperxor/pistache that referenced this issue Nov 26, 2020
dennisjenkins75 pushed a commit that referenced this issue Nov 27, 2020
* Fix issue #842

* Add one more dot

* Follow review suggestions

* clang format + small change
@hyperxor
Copy link
Contributor Author

@dennisjenkins75 I will close this issue in you don't mind. Fix is working.

@kiplingw
Copy link
Member

kiplingw commented Jan 8, 2021

Sadly it's still re-appearing in the unstable PPA builder as of today.

...
Trying to run server...
[server] Ininting...
Server address: localhost:38715
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
  result
    Which is: false
  true
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (2145 ms)
[----------] 8 tests from http_server_test (22064 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (22114 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST

      Start 13: http_client_test
13/26 MemCheck #13: http_client_test .................   Passed   11.94 sec
      Start 14: listener_test
...

@kiplingw kiplingw reopened this Jan 8, 2021
@kiplingw
Copy link
Member

Failed again today for focal on amd64.

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
[server] Ininting...
Server address: localhost:34997
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
[server] Sent: 127.0.0.1
Response code is OK
resolves: 3, rejects: 0, timeout: 1, wait: 6
[server] Disconnect from peer ID 19 connecting from 127.0.0.1
[server] Disconnect from peer ID 20 connecting from 127.0.0.1
/<<PKGBUILDDIR>>/tests/http_server_test.cc:503: Failure
Expected equality of these values:
  result
    Which is: false
  true
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled (2289 ms)
[----------] 8 tests from http_server_test (23993 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (24039 ms total)
[  PASSED  ] 7 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.client_multiple_requests_disconnects_handled

 1 FAILED TEST

@kiplingw kiplingw linked a pull request Mar 30, 2021 that will close this issue
@kiplingw kiplingw added this to the 1.0 milestone Apr 5, 2021
kiplingw added a commit that referenced this issue Apr 6, 2021
…bug_info

#842: Add more debug traces to http server test
@kiplingw kiplingw linked a pull request Apr 6, 2021 that will close this issue
@kiplingw
Copy link
Member

More again as of today here.

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:34409
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[client] [7f29f77fe700] Response code is OK
[server] [7f2a057d2700] Disconnect from peer Peer with ID=21 (address=127.0.0.1:13479, hostname=localhost, fd=87)
[client] [7f2a067d8740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7f2a057d2700] Disconnect from peer Peer with ID=22 (address=127.0.0.1:13991, hostname=localhost, fd=88)
[server] [7f2a057d2700] Disconnect from peer Peer with ID=23 (address=127.0.0.1:14503, hostname=localhost, fd=89)
[       OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (20021 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (20021 ms total)
[  PASSED  ] 9 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] http_server_test.multiple_client_with_different_requests_to_multithreaded_server

 1 FAILED TEST
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------

@kiplingw
Copy link
Member

Again just now while testing @jacobbogdanov's #859, but this time locally:

[ RUN      ] http_server_test.client_multiple_requests_disconnects_handled
Trying to run server...
Server address: localhost:38193
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Sent: `127.0.0.1` data to Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc28ed9640] Response code is OK
[client] [7efc2c6e4740] resolves: 3, rejects: 0, timeout: 1 seconds, wait: 6 seconds
[server] [7efc0bfff640] Disconnect from peer Peer with ID=21 (address=127.0.0.1:11422, hostname=localhost, fd=86)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=22 (address=127.0.0.1:11934, hostname=localhost, fd=87)
[server] [7efc0bfff640] Disconnect from peer Peer with ID=23 (address=127.0.0.1:12446, hostname=localhost, fd=88)
[       OK ] http_server_test.client_multiple_requests_disconnects_handled (1 ms)
[----------] 10 tests from http_server_test (32021 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 1 test suite ran. (32021 ms total)
[  PASSED  ] 10 tests.
--- stderr ---
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
An exception occured: Timeout
-------

@dennisjenkins75
Copy link
Collaborator

Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?

[1] https://valgrind.org/docs/manual/hg-manual.html
[2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual

@Tachi107
Copy link
Member

I'll try to add support for the thread sanitizer (and also others) in a PR in the following hours

@Tachi107
Copy link
Member

I've ran the tests with the thread sanitizer (with both GCC and Clang) and it reports data races in 19 tests, specifically async_test, router_test, cookie_test_3, http_server_test, http_client_test, request_size_test, streaming_test, and threadname_test. The logs are almost 8000 lines long, but if somebody wants to take a look I've uploaded them here (Clang) and here (GCC)

@kiplingw
Copy link
Member

I don't know anything about either of the two tools Dennis recommended. But I would suggest that we eliminate false positives from the log to make things easier.

@hyperxor
Copy link
Contributor Author

@kiplingw Hello, can you send build fails (if any) on http_server_test.client_multiple_requests_disconnects_handled test to this isssue? I think latest changes will help to find root cause of problem.

@Tachi107
Copy link
Member

You can find a lot of failures on Travis. Some of the most recent are these builds: 767081020, 767090930, 767090938

@Tachi107 Tachi107 reopened this Apr 15, 2021
@kiplingw
Copy link
Member

Will do.

hyperxor added a commit to hyperxor/pistache that referenced this issue Apr 26, 2021
hyperxor added a commit to hyperxor/pistache that referenced this issue Apr 26, 2021
hyperxor added a commit to hyperxor/pistache that referenced this issue Apr 26, 2021
kiplingw added a commit that referenced this issue May 21, 2021
@hyperxor
Copy link
Contributor Author

@kiplingw I think this issue can be closed if there are no bug reproductions.

@Tachi107
Copy link
Member

Yep. While the Travis Clang pipeline is broken for no reason (they probably messed up the apt sources), GCC builds seem fine. Good job :)

@kiplingw
Copy link
Member

Yes indeed, well done. Hopefully it's been squashed for good and this doesn't resurrect itself.

@kiplingw
Copy link
Member

I swear this bug is cursed. Happening again on armhf.

@tyler92
Copy link
Contributor

tyler92 commented Nov 5, 2023

Has anyone attempted to reproduce the problem under valgrind, specifically "helgrind" [1]? Or compile the code with TSAN [2] ?

[1] https://valgrind.org/docs/manual/hg-manual.html
[2] https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual

I faced this issue quite often. Thread sanitizer report:

WARNING: ThreadSanitizer: data race (pid=3415914)
  Write of size 8 at 0x7ba800000468 by thread T3 (mutexes: write M0):
    #0 close <null> 
    #1 Pistache::Http::Experimental::Connection::close() src/client/client.cc:624:9 
    #2 Pistache::Http::Experimental::ConnectionPool::shutdown() src/client/client.cc:881:27 
    #3 Pistache::Http::Experimental::Client::shutdown() src/client/client.cc:993:14 
    #4 clientLogicFunc(unsigned long, std::string const&, int, int) tests/http_server_test.cc:241:12 

  Previous read of size 8 at 0x7ba800000468 by thread T7:
    #0 recv <null> 
    #1 Pistache::Http::Experimental::Transport::handleIncoming(std::shared_ptr<Pistache::Http::Experimental::Connection>) src/client/client.cc:499:35 
    #2 Pistache::Http::Experimental::Transport::handleReadableEntry(Pistache::Aio::FdSet::Entry const&) src/client/client.cc:420:17 
    #3 Pistache::Http::Experimental::Transport::onReady(Pistache::Aio::FdSet const&) src/client/client.cc:255:17 
    #4 Pistache::Aio::SyncImpl::handleFds(std::vector<Pistache::Polling::Event>) const src/common/reactor.cc:205:34 
    #5 Pistache::Aio::SyncImpl::runOnce() src/common/reactor.cc:165:21 
    #6 Pistache::Aio::SyncImpl::run() src/common/reactor.cc:177:17 
    #7 Pistache::Aio::AsyncImpl::Worker::run()::'lambda'()::operator()() const src/common/reactor.cc:515:27 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment