Skip to content

Conversation

@LeftHandCold
Copy link
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue https://github.com/matrixorigin/MO-Cloud/issues/6448

What this PR does / why we need it:

Fixed a bug that caused the system to stuck after packet loss during data loading
The network read timeout is 24 hours; if packets are lost, it will wait indefinitely.

Copy link
Contributor

@fengttt fengttt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think tweaking timeout is the right fix for a system hung.

What is the real "case" for hung? And what error we generate?

@LeftHandCold
Copy link
Contributor Author

In general I don't think tweaking timeout is the right fix for a system hung.

What is the real "case" for hung? And what error we generate?

The real stuck during the load phase is here: https://github.com/matrixorigin/matrixone/blob/main/pkg/frontend/mysql_buffer.go#L569. Because the read timeout is 24 hours, if the connection isn't closed, the read operation will get stuck indefinitely, repeatedly returning an EOF error after 4 hours or 40 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants