Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close TCP, TLS connections gracefully to avoid data loss #123

Merged
merged 4 commits into from
Nov 11, 2024

Conversation

chrisberkhout
Copy link
Contributor

@chrisberkhout chrisberkhout commented Nov 4, 2024

Proposed commit message

Close TCP, TLS connections gracefully to avoid data loss (#)

TCP connections, including TLS connections, acknowledge received data.

Although a simple `net.Conn.Close()` will put all previously written
data on the network, the receiving server may disregard data that it
can't successfully acknowledge.

Graceful acknowledgement and closure can be facilitated by the client
closing writes first, and reading any available data before fully
closing the connection.

Checklist

  • Run the release workflow in GitHub actions (after merge)

Discussion

This bug was discovered due to flaky tests that use stream (issues listed below).

For a more detailed discussion of this topic please read the blog post
The ultimate SO_LINGER page, or: why is my tcp not reliable.

I tested this manually. Failures of the old version can be demonstrated as follows. Here I use data from integrations that is know to trigger the issue.

TLS

In one terminal run this server loop:

while true; do
  echo Running the server...
  echo
  openssl s_server -accept 4433 -naccept 1 \
    -cert ~/.elastic-package/profiles/default/certs/elastic-agent/cert.pem \
    -key ~/.elastic-package/profiles/default/certs/elastic-agent/key.pem \
    2>&1 | pv -L 100k | tee output.log
  echo
  if grep -q "ERROR" output.log; then
    echo "Error detected. Stopping."
    break
  fi
done

In another, repeatedly run this client:

go run main.go log --delay=1s --addr localhost:4433 -p=tls --insecure ../integrations/packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log

It should fail within a few runs. It fails more readily if the client is run soon after the server starts. If necessary, lowering the rate limit enforced by pv may help trigger a failure.

TCP

This uses GNU Netcat (BSD Netcat may differ). Having the server write some data seems to be necessary for the error to happen.

The server:

input_lines=$(cat ../integrations/packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log | wc -l)
while true; do
  echo Running the server...
  echo
  echo hi | nc -l -p 8888 | pv -L 100k | tee output.log
  echo
  output_lines=$(cat output.log | wc -l)
  if [ "$output_lines" != "$input_lines" ]; then
    echo "Error detected: $output_lines lines received != $input_lines lines sent. Stopping."
    break
  fi
done

The client:

go run main.go log --delay=1s --addr localhost:8888 -p=tcp ../integrations/packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log

Related issues

@chrisberkhout chrisberkhout added bug Something isn't working Team:Security-Service Integrations Team:Security-Service Integrations labels Nov 4, 2024
@chrisberkhout chrisberkhout self-assigned this Nov 4, 2024
@elasticmachine
Copy link

💚 Build Succeeded

cc @chrisberkhout

@chrisberkhout
Copy link
Contributor Author

@efd6 I think I was able to resolve those manual testing failures.

If it is run like this, the watch command will stop reading output as soon as it has enough, then stream will get a SIGPIPE and exit:

watch 'go run main.go log --delay=1s --addr localhost:8888 -p=tcp ../integrations/packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log'

But if the error output is redirected, watch will allow each run to successfully complete:

watch 'go run main.go log --delay=1s --addr localhost:8888 -p=tcp ../integrations/packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log 2> /dev/null'

@chrisberkhout chrisberkhout merged commit df7b22c into elastic:main Nov 11, 2024
8 checks passed
efd6 added a commit to efd6/integrations that referenced this pull request Nov 12, 2024
v0.17.1 of stream includes a fix for an issue with TCP and TLS
closure[1].

[1] elastic/stream#123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Security-Service Integrations Team:Security-Service Integrations
Projects
None yet
3 participants