-
Notifications
You must be signed in to change notification settings - Fork 158
Design Doc: Central Controller
Steve Hill, December 2016
At the highest possible level, CentralController
is an abstraction that supports operations which may benefit from central scheduling. CentralController was implemented to allow server-wide decision making with these operations, even when running on a server that forks multiple worker processes. It is an abstraction that allows workers to call an API and be oblivious to whether the request is being serviced in the local process or routed over an RPC system for central handling.
At the time of writing, CentralController supported two operations; ScheduleExpensiveOperation
and ScheduleRewrite
. Requests for each of these operations are eventually routed to an instance of ExpensiveOperationController
or ScheduleRewriteController
, respectively. These will be described in more detail later, but first we discuss CentralController itself.
To provide transactional guarantees, CentralController
employs RAII. Whenever a consumer successfully starts an operation through a call to a CentralController, some sort of Context
object is returned. The Context is a "handle" to the server and encapsulates everything required to communicate with the underlying implementation. The consumer can invoke methods on it without being concerned about the implementation. When the consumer is done, they delete the Context and it will ensure that all required information has been supplied to the implementation.
One important design consideration was that the "network layer" would be responsible for detecting disconnections/timeouts, not the implementations of the per-operation controllers. If any such a problem occurs, the network layer is expected to notice and forward an error condition "upwards". This frees the client side and per-operation controllers from having to deal with client disconnections/crashes, though does result in more complexity at the network layer.
At the time of writing, the only supported "network layer" was gRPC, which uses unix or TCP sockets as its underlying transport.
In order to detect disconnection on both the client and server sides, all RPCs to the controller process use grpc streaming requests. Unlike with unary requests, gRPC will notify the server side if a client disconnects while a streaming request is ongoing.
The pattern used here is:
- Client sends a single Request proto, which includes enough information for the server to start the request.
- Server waits for unbounded time before considering the request
- Upon consideration, if the server decides that the client is not permitted to perform the operation, it sends a single Reply proto with ok_to_proceed = false and terminates the session.
- Otherwise, the server sends a Reply with ok_to_proceed = true and leaves the session open.
- The client does whatever processing is required.
- When it's done, the client sends the server another Request message indicating that the operation has completed (possibly with additional information).
- The server processes the final Request and terminates the session.
If the stream is disconnected at any point before the transaction is complete, gRPC generates an error. The various classes in PageSpeed then ensure that both the client and server sides correctly and transparently register a failed operation.
In the code, this pattern is (perhaps poorly) named a "RequestResult" transaction, indicating that the client sends first a request, then a result message. The generic client and server side implementations of this pattern are RequestResultRpcClient
and RequestResultRpcHandler
, respectively.
The CentralController
implementation that uses gRPC is CentralControllerRpcClient
. It talks to a CentralControllerRpcServer
which encapsulates the per-operation controllers.
At its core, CentralControllerRpcClient manages a collection of grpc::ClientContext
s. Whenever an operation is scheduled through a CentralControllerRpcClient, the created context encapsulates a RequestResultRpcClient
, which itself has a grpc::ClientContext at its core. The CentralControllerRpcClient itself does little more than schedule the callbacks as they come off the work queue (gRPC calls this a CompletionQueue
) and shut down all the ClientContexts at exit.
To process the RPC events, CentralControllerRpcClient
encapsulates a thread. This is a single thread per-worker, so it's important that it is not used for for actual work (such as image rewrites), which would block the entire RPC system. Thus, all callbacks supplied to a CentralController must inherit from CentralControllerCallback
. This encapsulates a Sequence
that will actually be used to run the operations. The RpcClient thread is then used only for the trivial task of queuing the callback onto the Sequence.
A true central controller process is started when you configure ExperimentalCentralControllerPort
. This can be a fixed TCP port or, if prefixed with unix:
, a UNIX domain socket that will be created for you. For various reasons, forking a process in a portable and reliable way is non-trivial. The controller process uses jefftk's ControllerManager
, which ensures that the controller process is restarted if it crashes, but also dies at process exit. The main gRPC server class is CentralControllerRpcServer
.
There are two simpler implementations of CentralController; InProcessCentralController
and CompatibleCentralController
. InProcessCentralController is designed for server processes that do not fork at all, where an RPC system would be wasteful. InProcessCentralController simply encapsulates the per-operation controllers and passes requests directly to them. At the time of writing it was not used directly.
There is also CompatibleCentralController
(a subclass of InProcessCentralController) which is used when no controller process is available. CompatibleCentralController implements CentralController using per-operation controllers that do not require an RPC system. These implementations are the pre-CentralController code and are generally less sophisticated.
The CentralController method ScheduleExpensiveOperation
is used for limiting CPU usage; Child processes should call this before attempting an operation that requires a lot of CPU, for instance image transcoding. Calls are routed to an ExpensiveOperationController
which limits the number of things that can burn the CPU at once.
Both of the ExpensiveOperationControllers are configured via the ImageMaxRewritesAtOnce
parameter.
This is the implementation of ExpensiveOperationController that is used when no controller process is available. It uses a Statistic
to track how many expensive operations are ongoing. If the incoming request would exceed the limit, it denies the request and the client is prevented from performing the expensive operation. Denies here are unfortunate, because the work required to get to this point in the code is wasted.
The RPC-enabled implementation of ExpensiveOperationController never denies a request. Instead, all requests are placed into a queue and processed in-order as slots become available. The queue here is unbounded because the number of outstanding requests is expected to be limited by PopularityContestScheduleRewriteController.
The other CentralController method, ScheduleRewrite
, is used to ensure that only one worker can perform a given rewrite at a time, preventing redundant work. Crucially, it also allows the order of rewrites to be controlled by the preference of a ScheduleRewriteController
.
Many of the rewrites PageSpeed performs have a relatively minor CPU cost and complete quickly, therefore it does not make sense to queue/rate-limit them. Thus, only rewrites that might involve rewriting images are routed via ScheduleRewrite. It's also important that, within the dependency tree of a RewriteContext, only one rewrite is controlled via ScheduleRewrite
; If multiple rewrites nested within the same RewriteContext were scheduled that way, at best extremely long delays can occur, possibly even deadlocks. This is all arbitrated in RewriteContext::ObtainLockForCreation
.
At the time of writing there were two implementations of ScheduleRewriteController
:
This was the primary motivation for CentralController. It limits the number of rewrites that are currently running, scheduling them via a priority queue ordered by the number of requests received. In theory this means that the rewrites that are most in-demand are performed first.
The popularity contest is configured via ExperimentalPopularityContestMaxInFlight
, which limits the number of rewrites that can be ongoing at once, and ExperimentalPopularityContestMaxQueueSize
which limits the number of requests the popularity contest is prepared to track. When ScheduleRewrite
is called, if too many rewrites are in-flight, the popularity contest will try to queue the request. If the queue is full, the request will be unconditionally dropped. Only one instance of a rewrite will ever be queued; If a request comes in for a rewrite that is already queued, the queued request is canceled and replaced with the incoming request (at incremented priority).
When a rewrite completes successfully, the popularity contest completely deletes it from the queue and schedules the next most popular request. However, a client can also declare that the rewrite failed. In this case the popularity contest will put the request back on the queue at its old priority. Given that it was previously popular, this queues it for retry at increased priority.
Note that because of how CentralController is hooked up, a queued rewrite is actually an entire RewriteContext
, which has a non-trivial memory footprint.
This is the implementation that works without a controller process. It uses a NamedLockManager
to ensure that only one copy of a rewrite is outstanding, but makes no attempt to prioritize rewrites or provide a system-wide limit on the total number of rewrites in-flight.