-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-1524. Optional DataStreamManagement#startTransaction configuration #601
base: master
Are you sure you want to change the base?
RATIS-1524. Optional DataStreamManagement#startTransaction configuration #601
Conversation
@szetszwo What do you think of this change? |
@szetszwo I want to discuss a question about ozone BCSID. Does BCSID still make sense when you use stream to transfer data and put A Block into the stream process? Because raft transport is deprecated, playback of raft log is eliminated. I can open a JIRA in Ozone and discuss this |
@guohao-rosicky , without BCSID and the Ratis log, the current Ozone design won't work for recovery. Why we need Ratis (or Raft) in Ozone? It is because Ratis provides a consistent view of the data among the servers. Without Ratis, the data in the servers may diverge. How could you tell which one to trust? And how to do data recovery? |
Thank you @szetszwo , I have understood BCSID. In one of my test versions of ozone, I generated raft logs by sending raft async RPC through DataStreamManagement#startTransaction to get the BCSID with the raft log index of ozone, Throughput is very small because raft needs to sort. In another version of my test, if DataStreamManagement#startTransaction throughput was doubled by skipping it, could we get a BCSID in another way than raft log? |
We need a BCSID and also the ability to commit transactions. I guess there are no easy ways. Otherwise, we can use it to replace the Raft Consensus Algorithm in general. |
@szetszwo In other words, generating an ID on the Primary node and passing it to the other nodes as a stream can improve throughput without raft requests internally. |
How to make sure the ID is unique? All the nodes could be the Primary node at some point of time. |
I can open a JIRA in Ratis and discuss this @szetszwo @captainzmc |
@guohao-rosicky, @captainzmc, I really hope that we could fix "TimeoutIOException: Timeout 3000ms". How about we fix it first? |
We changed this configuration to fix this problem because DataStreamManagement#startTransaction was taking too long. We have a test report showing that DataStreamManagement#startTransaction caused "TimeoutIOException: Timeout 3000ms". @szetszwo I was hoping you could help us come up with a solution. |
@szetszwo https://docs.google.com/document/d/1mS3GqovQ3D1b7V0L3--VF9xhl5jdId1mSL0cQNb7uHo/edit This is the process of testing reports and locating problems |
This is not really a fix since it changes the functionality.
"TimeoutIOException: Timeout 3000ms" probably started happening after RATIS-1438. How about we increase the timeout value, say to 10 seconds? |
It can be changed to 10 seconds and I will submit a new PR for this |
@szetszwo Can you take a look at our test report and consider further optimizing the performance of ratis stream over ozone based on the problems identified in the test report. Discuss how to optimize the scheme. @captainzmc and I can participate in the development of optimization |
What changes were proposed in this pull request?
Optional DataStreamManagement#startTransaction configuration
What is the link to the Apache JIRA
see:
https://issues.apache.org/jira/browse/RATIS-1524
https://issues.apache.org/jira/browse/RATIS-1513