diff --git a/docs/architecture/actor_system.md b/docs/architecture/actor_system.md new file mode 100644 index 000000000..d3e739841 --- /dev/null +++ b/docs/architecture/actor_system.md @@ -0,0 +1,108 @@ +# Rally Actor System + +At its heart, Rally is a distributed system. It has been designed that way to +allow using multiple load drivers in the same benchmark, to ensure that Rally +is never a bottleneck. In the vast majority of cases, using a powerful load +driver is enough, but benchmarks large Elasticsearch clusters containing tens +or hundreds of nodes can require more load drivers. + +## Thespian + +Actors are managed by [Thespian](https://thespianpy.com/doc/) which provide us +with the following features: + + * Works with Linux and macOS (and Windows, but Rally does not need that) + * Handles the communication between actors regardless of their locations + * Scales from running Rally and Elasticsearch on one workstation to + benchmarking large Elasticsearch clusters with multiple load drivers, + without any change to the Rally codebase. + +While it is not without its rough edges, it is [well +documented](https://thespianpy.com/doc/using.html) and battle tested. +Additionally, the maintainer has always been responsive and helpful. + +## Sequence diagram + +Rally has a number of actors that all inherit from `actor.RallyActor`. This +document focuses on the actors needed to *prepare* and *run* a benchmark, with +the following limitations: + + * The mechanic actors that can setup an Elasticsearch cluster are not covered + * Failure and cancellation are ignored, this is about the happy path + * This pretends that we are benchmarking on a single machine with a single + core. + +The sequence diagram below starts with `BenchmarkActor` defined in +`racecontrol.py` which does the high-level scheduling: + + * Setup the Elasticsearch cluster if the pipeline requires it (not covered + here) + * Prepare the benchmark which involves the `TrackPreparator` and + `TaskExecutionActor` actors. + * Start the benchmark, which will involve the `Worker` actor that will + delegate the actual work to `AsyncIoAdapter`, that will run an asyncio loop + in a thread, which is how `AsyncExecutor` runs many Elasticsearch async + clients. + +You'll notice that `DriverActor` and `Driver` are tightly coupled. While +`Driver` contains most of the logic, it is not an actor, so it relies on +`DriverActor` to send and receive messages. This was done for two reasons: + + * `Driver` can be unit-tested without bringing the actor system + * An earlier attempt at removing the actor system failed. + +```mermaid +sequenceDiagram + participant BenchmarkActor + participant DriverActor + participant Driver + participant TrackPreparator + participant TaskExecutionActor + participant Worker + + BenchmarkActor ->> DriverActor: __init__ + BenchmarkActor -->> DriverActor: PrepareBenchmark + DriverActor ->> Driver: __init__ + DriverActor ->> Driver: prepare_benchmark + Driver ->> DriverActor: prepare_track + DriverActor ->> TrackPreparator: __init__ + DriverActor -->> TrackPreparator: Bootstrap + TrackPreparator -->> DriverActor: ReadyForWork + DriverActor -->> TrackPreparator: PrepareTrack + TrackPreparator ->> TaskExecutionActor: __init__ + TrackPreparator -->> TaskExecutionActor: StartTaskLoop + loop + TaskExecutionActor -->> TrackPreparator: ReadyForWork + TrackPreparator -->> TaskExecutionActor: DoTask + loop + TaskExecutionActor -->> TaskExecutionActor: WakeupMessage + end + end + TaskExecutionActor -->> TrackPreparator: WorkerIdle + TrackPreparator -->> DriverActor: TrackPrepared + DriverActor -->> TrackPreparator: ActorExitRequest + TrackPreparator -->> TaskExecutionActor: ActorExitRequest + Driver -->> BenchmarkActor: PreparationComplete + BenchmarkActor -->> DriverActor: StartBenchmark + loop + DriverActor -->> DriverActor: WakeupMessage + DriverActor -->> Driver: post_process_samples + DriverActor -->> Driver: update_progress_messages + end + DriverActor ->> Driver: start_benchmark + Driver ->> DriverActor: create_client + DriverActor ->> Worker: __init__ + Driver ->> DriverActor: start_worker + DriverActor -->> Worker: StartWorker + loop + loop + Worker ->> AsyncIoAdapter: __init__ + AsyncIoAdapter ->> Worker: "thread finished" + Worker -->> Worker: WakeupMessage + end + Worker -->> DriverActor: JoinPointReached + DriverActor ->> Driver: joinpoint_reached + end + Driver ->> DriverActor: on_benchmark_complete + DriverActor -->> BenchmarkActor: BenchmarkComplete +``` diff --git a/docs/conf.py b/docs/conf.py index 4754c1b86..370a4669a 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -115,7 +115,7 @@ def read_min_es_version(): # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. -exclude_patterns = ["_build"] +exclude_patterns = ["_build", "architecture"] # If true, '()' will be appended to :func: etc. cross-reference text. # add_function_parentheses = True