Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-13522. IPC changes to support observer reads through routers. #4441

Closed
wants to merge 1 commit into from

Conversation

ZanderXu
Copy link
Contributor

This PR is about RBF IPC changes in order to support Observer-Read.

And this a draft PR, and relevant PR already exist.

Different point:

  • Proxy always with AlignmentContext, like RouterGSIContext which contains a boolean flag, like enableObserverRead
  • Always update the lastSeenStateId from active
  • If enableObserverRead is false, updateRequestState will not set lastSeenStateId in RPCHeader
  • It's easily to dynamically reconfigure enableObserverRead.

@ZanderXu
Copy link
Contributor Author

@simbadzina @omalley @melissayou here, please help me review it.

@simbadzina
Copy link
Member

@simbadzina @omalley @melissayou here, please help me review it.

Hi @ZanderXu. Sorry for the slow responses. Thanks for your review. I'll take a look at the draft PR and also your review comments tomorrow.

@slfan1989
Copy link
Contributor

This jira is very technically challenging, learn more from you!

@ZanderXu
Copy link
Contributor Author

Thanks @simbadzina @slfan1989 . Looking forward to your review results, thanks~

@zhengchenyu
Copy link
Contributor

@ZanderXu
Seems HDFS-13522.002.patch have include this function. The difference is that in HDFS-13522.002.patch, enable or disable observer read in client side. In your PR, enable or disable observer read in router side.

@ZanderXu
Copy link
Contributor Author

Thanks @zhengchenyu for your review and comment. This a draft PR related to PR4311. I'm not following HDFS-13522.002.patch, and I will read it carefully.

Client -> RBF -> NameNode.
Whether RBF proxies the read request to the Observer should have nothing to do with the Client.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 32s Maven dependency ordering for branch
+1 💚 mvninstall 27m 49s trunk passed
+1 💚 compile 25m 6s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 21m 29s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 34s trunk passed
+1 💚 mvnsite 3m 15s trunk passed
+1 💚 javadoc 2m 42s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 49s trunk passed
+1 💚 shadedclient 24m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 44s the patch passed
+1 💚 compile 24m 10s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 24m 10s the patch passed
+1 💚 compile 21m 42s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 21m 42s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 22s /results-checkstyle-root.txt root: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1)
+1 💚 mvnsite 3m 7s the patch passed
+1 💚 javadoc 2m 37s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 31s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 2m 3s /new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 25m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 44s hadoop-common in the patch passed.
+1 💚 unit 41m 40s hadoop-hdfs-rbf in the patch passed.
+1 💚 asflicense 1m 18s The patch does not generate ASF License warnings.
289m 53s
Reason Tests
SpotBugs module:hadoop-hdfs-project/hadoop-hdfs-rbf
org.apache.hadoop.hdfs.server.federation.router.ConnectionManager.reconfEnableObserverRead(boolean) does not release lock on all exception paths At ConnectionManager.java:on all exception paths At ConnectionManager.java:[line 253]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4441/1/artifact/out/Dockerfile
GITHUB PR #4441
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 99c2ac8f930f 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8559694
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4441/1/testReport/
Max. process+thread count 2211 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-rbf U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4441/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@zhengchenyu
Copy link
Contributor

Thanks @zhengchenyu for your review and comment. This a draft PR related to PR4311. I'm not following HDFS-13522.002.patch, and I will read it carefully.

Client -> RBF -> NameNode. Whether RBF proxies the read request to the Observer should have nothing to do with the Client.

In HDFS-13522.002.patch, isReadCall method, router will check "call.getClientStateId() == -1L". This is rpc call level. If observer read is disable in client side, call.getClientStateId() in router side will return -1, router will ignore observer namenode.
I think config in client side may be more flexible.

By the way, I add some extra comment. In HDFS-13522.002.patch, router only check whether state id is -1. They don't pass client's state id. If dfs.federation.router.observer.auto-msync-period are not set to 0, but a large number, will be wrong.
In our draft design, after apply HDFS-13522.002.patch, I wanna proxy client's state id. For busy work recently, I delay it. Maybe we can work and discuss it together.

Comment on lines +250 to +260
* Dynamically reconfigure the enableObserverRead.
*/
public void reconfEnableObserverRead(boolean enableObserverRead) {
readLock.lock();
this.enableObserverRead = enableObserverRead;
for (RouterGSIContext routerGSIContext : alignmentContexts.values()) {
routerGSIContext.setEnableObserverRead(enableObserverRead);
}
readLock.unlock();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reconfigure observer reads dynamically?

*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class RouterGSIContext extends ClientGSIContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the alignment between routers and namenode can be taken care of with just ClientGSIContext.
How does the router not updating the lastSeenStateID when communicating with the namenode help?

@ZanderXu
Copy link
Contributor Author

Thanks @zhengchenyu and @simbadzina .

I think config in client side may be more flexible.

This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled.
But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode.

@simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs.

In our draft design, after apply HDFS-13522.002.patch, I wanna proxy client's state id.

Proxying client's state id to the NameNode by RBF will be very complicated.

  • A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different.
  • The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF.

@ZanderXu
Copy link
Contributor Author

As in my draft PR above, RBF always updates lastSeenTxid from Active and saves. When an NS enable ObserverRead, RBF will set the stored lastSeenTxid of this NS to the RPC header and bring it to the Observer NameNode; if the NS disable ObserverRead, RBF will not set the stated id in RPC header, so even if the request is passed to the Observer, the Observer will also returns StandbyException.

@zhengchenyu
Copy link
Contributor

zhengchenyu commented Jun 16, 2022

Thanks @zhengchenyu and @simbadzina .

I think config in client side may be more flexible.

This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled. But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode.

@simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs.

In our draft design, after apply HDFS-13522.002.patch, I wanna proxy client's state id.

Proxying client's state id to the NameNode by RBF will be very complicated.

  • A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different.
  • The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF.

Yes, you are right in some condition. If all client are common user, for hive and mr application, it is right.
We know observer can not guarantee strong consistency, maybe some use have high demand, they wanna disable observe read, though few user have this demand.

Maybe we can reserve configuration both on router side and client side.

Yes, Proxying client's state id is complicated. I don't know whether it is necessary or not. So just delay it.

@ZanderXu
Copy link
Contributor Author

ZanderXu commented Jun 16, 2022

We know observer can not guarantee strong consistency, maybe some use have high demand, they wanna disable observe read, though few user have this demand.

Only a very small number of users have high demand, and in most cases, the client enables ObserverRead default. In other words: In most cases, there is no need for client to pass the ObserverRead enable flag to RBF. So only a very small number of requests need to carry a specific flag bit to RBF, so that the RBF can force an msync to ensure the consistency before proxying the request.

There are serval methods for the client side to carry the force consistency flag to RBF:

  1. Carry a special StateID to RBF, such as -100 (Client Process level)
  2. Carry a special filed attributes to RBF through CallerContext (single RPC level)
  3. etc..

@zhengchenyu @simbadzina About my suggestion, I'm looking forward your feedback, thanks. In order to find a more perfect solution.

@simbadzina
Copy link
Member

Thanks @zhengchenyu and @simbadzina .

I think config in client side may be more flexible.

This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled. But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode.

@simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs.

In our draft design, after apply HDFS-13522.002.patch, I wanna proxy client's state id.

Proxying client's state id to the NameNode by RBF will be very complicated.

  • A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different.
  • The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF.

@ZanderXu in my full PR, #4127, I do also allow routers to enable and disable observer reads. The difference being that it requires a router restart. Since routers are stateless this is a quick operation. At most one minute.

@simbadzina
Copy link
Member

simbadzina commented Jun 16, 2022

@ZanderXu in #4127 I have configurations on both the router and client side. Consistency is also guaranteed because the router always does an msync.
The reason for the client side configuration is for latency sensitive clients that just want one call between the router and the namenodes.

@ZanderXu
Copy link
Contributor Author

ZanderXu commented Jun 20, 2022

Thanks @simbadzina @zhengchenyu for your review and warm discussion. I learned a lot from it.
I will close this draft pr and discuss this feature at 4127

@ZanderXu ZanderXu closed this Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants