Skip to content

Commit

Permalink
Improve client en doc
Browse files Browse the repository at this point in the history
  • Loading branch information
wangweibing committed Sep 15, 2021
1 parent 91bc3ef commit f075023
Showing 1 changed file with 19 additions and 3 deletions.
22 changes: 19 additions & 3 deletions docs/en/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,8 @@ which is round robin. Always choose next server inside the list, next of the las

which is weighted round robin. Choose the next server according to the configured weight. The chances a server is selected is consistent with its weight, and the algorithm can make each server selection scattered.

The instance tag must be an int32 number representing the weight, eg. tag="50".

### random

Randomly choose one server from the list, no other settings. Similarly with round robin, the algorithm assumes that servers to access are similar.
Expand All @@ -224,6 +226,8 @@ Randomly choose one server from the list, no other settings. Similarly with roun

which is weighted random. Choose the next server according to the configured weight. The chances a server is selected is consistent with its weight.

Requirements of instance tag is the same as wrr.

### la

which is locality-aware. Perfer servers with lower latencies, until the latency is higher than others, no other settings. Check out [Locality-aware load balancing](lalb.md) for more details.
Expand All @@ -240,6 +244,8 @@ Do distinguish "key" and "attributes" of the request. Don't compute request_code
Check out [Consistent Hashing](consistent_hashing.md) for more details.
Other kind of lb does not need to set Controller.set_request_code(). If request code is set, it will not be used by lb. For example, lb=rr, and call Controller.set_request_code(), even if request_code is the same for every request, lb will balance the requests using the rr policy.
### Client-side throttling for recovery from cluster downtime
Cluster downtime refers to the state in which all servers in the cluster are unavailable. Due to the health check mechanism, when the cluster returns to normal, server will go online one by one. When a server is online, all traffic will be sent to it, which may cause the service to be overloaded again. If circuit breaker is enabled, server may be offline again before the other servers go online, and the cluster can never be recovered. As a solution, brpc provides a client-side throttling mechanism for recovery after cluster downtime. When no server is available in the cluster, the cluster enters recovery state. Assuming that the minimum number of servers that can serve all requests is min_working_instances, current number of servers available in the cluster is q, then in recovery state, the probability of client accepting the request is q/min_working_instances, otherwise it is discarded. If q remains unchanged for a period of time(hold_seconds), the traffic is resent to all available servers and leaves recovery state. Whether the request is rejected in recovery state is indicated by whether controller.ErrorCode() is equal to brpc::ERJECT, and the rejected request will not be retried by the framework.
Expand Down Expand Up @@ -293,6 +299,12 @@ if (cntl.Failed()) {
}
```
> WARNING: Do NOT use synchronous call when you are holding a pthread lock! Otherwise it is easy to cause deadlock.
>
> Solution (choose one of the two):
> 1. Replace pthread lock with bthread lock (bthread_mutex_t)
> 1. Release the lock before CallMethod
## Asynchronous call
Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod.
Expand All @@ -301,7 +313,11 @@ Because end of CallMethod does not mean completion of RPC, response/controller m
You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended.
**Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best).
The Request can be destroyed immediately after an asynchronous request is initiated. (SelectiveChannel is an exception, in the case of SelectiveChannel, the request object must be released after rpc finish)
After initiating an asynchronous request (after CallMethod), it is not recommended to destroy the Channel immediately. (Due to [a bug](https://github.com/apache/incubator-brpc/issues/658) in current brpc implementation, there is a small probability of crash if the Channel destruct before the asynchronous request finish.)
In the process of initiating an asynchronous request (in the CallMethod process), the Channel cannot be destroyed, and deleting the Channel that is being used by another thread is an undefined behavior (it is likely to crash).
### Use NewCallback
```c++
Expand All @@ -324,7 +340,7 @@ MyService_Stub stub(&channel);
MyRequest request; // you don't have to new request, even in an asynchronous call.
request.set_foo(...);
cntl->set_timeout_ms(...);
stub.some_method(cntl, &request, response, google::protobuf::NewCallback(OnRPCDone, response, cntl));
stub.some_method(cntl, &request, response, brpc::NewCallback(OnRPCDone, response, cntl));
```
Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback.

Expand Down Expand Up @@ -523,7 +539,7 @@ NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is conne

## Retry

ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries.
ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Default value is 3, 0 means no retries. Controller.set_max_retry() overrides value for one RPC.

Controller.retried_count() returns number of retries.

Expand Down

0 comments on commit f075023

Please sign in to comment.