improve docs (#13)

epwalsh · web-flow · commit 03b1a8af778f · 2020-04-08T14:40:05.000-07:00
* improve docs

* fix
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -5,6 +5,7 @@
 - [Quick Start](quick-start.md)
 - [Going Deeper](guide/index.md)
   - [Defining Tasks](guide/defining-tasks.md)
+  - [Running Workers](guide/running-workers.md)
 - [Best Practices](best-practices/index.md)
 - [Coming from Python?](coming-from-python/index.md)
 
diff --git a/src/best-practices/index.md b/src/best-practices/index.md
@@ -2,9 +2,13 @@
 
 ## Acks early vs acks late
 
-If you're familiar with Python Celery, [the answer](https://docs.celeryproject.org/en/stable/faq.html#should-i-use-retry-or-acks-late) of whether to set `acks_late` to `true` or `false` is the same: it depends.
+Tasks are only removed from a queue when they are acknowledged ("acked") by the worker that received them. The [`acks_late`](https://docs.rs/celery/*/celery/struct.CeleryBuilder.html#method.acks_late) setting determines when a worker will ack a task. When set to `true`, tasks are acked after the worker finishes executing them. When set to `false`, they are executed right before the worker starts executing them.
 
-The effect of acking late is that if a worker were to crash, any tasks that it's currently executing will be retried automatically by the next available worker. So if your tasks are [idempotent](https://docs.celeryproject.org/en/stable/glossary.html#term-idempotent) then it's recommended that you [set `acks_late` to `true`](https://docs.celeryproject.org/en/stable/glossary.html#term-idempotent). On the other hand, if retrying tasks that have potentially already executed could cause more damage than not retrying them, you should not ack late.
+The default of `acks_late` is `false`, however if your tasks are [idempotent](https://docs.celeryproject.org/en/stable/glossary.html#term-idempotent) it's strongly recommended that you set `acks_late` to `true`. This has two major benefits.
+
+First, it ensures that if a worker were to crash, any tasks currently executing will be retried automatically by the next available worker.
+
+Second, it provides a better [back pressure](https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7) mechanism when used in conjunction with a suitable [`prefetch_count`](https://docs.rs/celery/*/celery/struct.CeleryBuilder.html#method.prefetch_count) (see below).
 
 ## Prefetch count
 
@@ -14,8 +18,12 @@ When initializing your Rust Celery app it's recommended that you [set the `prefe
 
 The `prefetch_count` determines how many un-acked tasks (ignoring those with a future ETA) that a worker can hold onto at any point in time. Having `prefetch_count` too low or too high can create a bottleneck.
 
-If the number is set too low, workers could be under-utilized. If the number is set too high, workers could be hogging tasks that they can't execute yet.
+If the number is set too low, workers could be under-utilized. If the number is set too high, workers could be hogging tasks that they can't execute yet, or worse: they could run out of memory from receiving too many tasks and crash.
 
 Unfortunately finding an optimal prefetch count is easier said than done. It depends on a lot of factors, such as the hardware your workers are running on, the task throughput, and whether your tasks are more CPU-bound or IO-bound.
 
-The last reason is especially important. A worker running on even a single CPU can probably handle hundreds, if not thousands, of (non-blocking) IO-bound tasks at once. But a worker consuming CPU-bound tasks is essentially limited to executing one task per CPU core. Therefore a good starting point for `prefetch_count` would be either `100 x NUM_CPUS` for IO-bound tasks or `m * NUM_CPUS` for CPU-bound tasks, where `m` is a small integer between 1 and 4.
+The last reason is especially important. A worker running on even a single CPU can probably handle hundreds, if not thousands, of (non-blocking) IO-bound tasks at once. But a worker consuming CPU-bound tasks is essentially limited to executing one task per CPU core. Therefore a good starting point for `prefetch_count` would be either `100 x NUM_CPUS` for IO-bound tasks or `2 * NUM_CPUS` for CPU-bound tasks.
+
+## Consuming blocking / CPU-bound tasks
+
+If your tasks are CPU-bound (or otherwise blocking), it's recommended that you use a multi-threaded async runtime, such as [the one](https://docs.rs/tokio/0.2.16/tokio/runtime/index.html#threaded-scheduler) provided by `tokio`. Within the task body you can then call [`tokio::task::block_in_place`](https://docs.rs/tokio/0.2.16/tokio/task/index.html#block_in_place) where appropriate.
diff --git a/src/coming-from-python/index.md b/src/coming-from-python/index.md
@@ -29,45 +29,6 @@ my_app.register_task::<add>().await.unwrap();
 # }
 ```
 
-## Running a worker
-
-While Python Celery provides a CLI that you can use to run a worker, in Rust you'll have to implement your own worker binary. However this is a lot easier than it sounds. At a minimum you just need to initialize your [`Celery`](https://docs.rs/celery/*/celery/struct.Celery.html) application, define and register your tasks, and run the [`Celery::consume`](https://docs.rs/celery/*/celery/struct.Celery.html#method.consume) method within your `main` function.
-
-Note that `Celery::consume` is an `async` method, which means you need an async runtime to execute it. Luckily this is provided by [`tokio`](https://docs.rs/tokio/*/tokio/) and is as simple as declaring your `main` function `async` and decorating it with the `tokio::main` macro.
-
-Here is a complete example of a worker application:
-
-```rust,no_run,noplaypen
-#![allow(non_upper_case_globals)]
-
-use celery::TaskResult;
-use exitfailure::ExitFailure;
-
-#[celery::task]
-fn add(x: i32, y: i32) -> TaskResult<i32> {
-    Ok(x + y)
-}
-
-#[tokio::main]
-async fn main() -> Result<(), ExitFailure> {
-    env_logger::init();
-
-    let celery_app = celery::app!(
-        broker = AMQP { std::env::var("AMQP_ADDR").unwrap() },
-        tasks = [add],
-        task_routes = [],
-        prefetch_count = 2,
-        default_queue = "celery-rs",
-    );
-
-    celery_app.consume().await?;
-
-    Ok(())
-}
-```
-
-The `consume` method will listen for `SIGINT` and `SIGTERM` signals - just like a Python worker  - and will try to finish all pending tasks before shutting down unless it receives another signal.
-
 ## Time limits vs timeout
 
 In Python you configure tasks to have a [soft or hard time limit](https://docs.celeryproject.org/en/latest/userguide/workers.html#time-limits). A soft time limit allows a task to clean up after itself if it runs over the limit, while a hard limit will force terminate the task.
diff --git a/src/guide/running-workers.md b/src/guide/running-workers.md
@@ -0,0 +1,37 @@
+# Running Workers
+
+While the Python version of Celery provides a CLI that you can use to run a worker, in Rust you'll have to implement your own worker binary. However this is a lot easier than it sounds. At a minimum you just need to initialize your [`Celery`](https://docs.rs/celery/*/celery/struct.Celery.html) application, define and register your tasks, and run the [`Celery::consume`](https://docs.rs/celery/*/celery/struct.Celery.html#method.consume) method within an async executor.
+
+Here is a complete example of a worker application:
+
+```rust,no_run,noplaypen
+#![allow(non_upper_case_globals)]
+
+use celery::TaskResult;
+use exitfailure::ExitFailure;
+
+#[celery::task]
+fn add(x: i32, y: i32) -> TaskResult<i32> {
+    Ok(x + y)
+}
+
+#[tokio::main]
+async fn main() -> Result<(), ExitFailure> {
+    env_logger::init();
+
+    let celery_app = celery::app!(
+        broker = AMQP { std::env::var("AMQP_ADDR").unwrap() },
+        tasks = [add],
+        task_routes = [],
+        prefetch_count = 2,
+        acks_late = true,
+        default_queue = "celery-rs",
+    );
+
+    celery_app.consume().await?;
+
+    Ok(())
+}
+```
+
+The `consume` method will listen for `SIGINT` and `SIGTERM` signals - just like a Python worker  - and will try to finish all pending tasks before shutting down unless it receives another signal.
diff --git a/src/lib.rs b/src/lib.rs
@@ -6,6 +6,7 @@ doctest!("./what-is-celery.md");
 doctest!("./quick-start.md");
 doctest!("./guide/index.md");
 doctest!("./guide/defining-tasks.md");
+doctest!("./guide/running-workers.md");
 doctest!("./best-practices/index.md");
 doctest!("./coming-from-python/index.md");
 doctest!("./additional-resources.md");