Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add best practices page to Dask cuDF docs #16821

Merged
merged 31 commits into from
Sep 20, 2024
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
f01fd71
start best practices page for dask-cudf
rjzamora Sep 16, 2024
7aa8041
revisions
rjzamora Sep 17, 2024
b2ce634
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 17, 2024
1e028ea
address code review
rjzamora Sep 18, 2024
3332717
more revisions
rjzamora Sep 18, 2024
eee37f3
more revisions
rjzamora Sep 18, 2024
7c63c7e
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 18, 2024
a425405
add from_map note on meta
rjzamora Sep 18, 2024
9233524
add note on diagnostics
rjzamora Sep 18, 2024
bd144c2
fix typos
rjzamora Sep 18, 2024
5f854e7
tweak wording
rjzamora Sep 19, 2024
397efa7
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 19, 2024
6c8771b
fix map_partitions typo
rjzamora Sep 19, 2024
f7731b8
revisions
rjzamora Sep 19, 2024
581a69f
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 19, 2024
8515cb9
fix spelling error and add link to quick-start example
rjzamora Sep 19, 2024
a23deff
replace link to readme
rjzamora Sep 19, 2024
4c1b55d
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 19, 2024
8ecd536
add a bit more info about wait and CPU-GPU data movement
rjzamora Sep 20, 2024
251bf23
Merge branch 'branch-24.10' into dask-cudf-best-practices
rjzamora Sep 20, 2024
40a638e
update
rjzamora Sep 20, 2024
d082cac
Apply suggestions from code review
rjzamora Sep 20, 2024
8152fca
Apply suggestions from code review
rjzamora Sep 20, 2024
a653a5a
Merge remote-tracking branch 'upstream/branch-24.10' into dask-cudf-b…
rjzamora Sep 20, 2024
91d4fd5
fix lists
rjzamora Sep 20, 2024
d58a5ce
fix func list
rjzamora Sep 20, 2024
59e597a
roll back func change
rjzamora Sep 20, 2024
adbd22d
fix more double-colon mistakes
rjzamora Sep 20, 2024
216d5de
Merge branch 'branch-24.10' into dask-cudf-best-practices
rjzamora Sep 20, 2024
d76dbd6
Apply suggestions from code review
rjzamora Sep 20, 2024
da7308a
Merge branch 'branch-24.10' into dask-cudf-best-practices
rjzamora Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
  • Loading branch information
rjzamora and wence- authored Sep 20, 2024
commit d76dbd6aba21873590e85e95fcb9ace37e70554a
16 changes: 8 additions & 8 deletions docs/dask_cudf/source/best_practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Deployment and Configuration
Use Dask-CUDA
~~~~~~~~~~~~~

In order to execute a Dask workflow on multiple GPUs, a Dask cluster must
To execute a Dask workflow on multiple GPUs, a Dask cluster must
be deployed with `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
and `Dask.distributed <https://distributed.dask.org/en/stable/>`__.

Expand All @@ -47,7 +47,7 @@ is also illustrated within the multi-GPU section of `Dask cuDF's
<https://docs.dask.org/en/latest/deploying-kubernetes.html>`__ and `Dask-Jobqueue
<https://jobqueue.dask.org/en/latest/>`__.

Please see `RAPIDS-deployment documentation <https://docs.rapids.ai/deployment/stable/>`__
Please see `the RAPIDS deployment documentation <https://docs.rapids.ai/deployment/stable/>`__
for further details and examples.


Expand All @@ -63,7 +63,7 @@ These tools include an intuitive `browser dashboard
No matter the workflow, using the dashboard is strongly recommended.
It provides a visual representation of the worker resources and compute
progress. It also shows basic GPU memory and utilization metrics (under
the ``GPU`` tab). In order to visualize further GPU metrics in JupyterLab,
the ``GPU`` tab). To visualize more detailed GPU metrics in JupyterLab,
use `NVDashboard <https://github.com/rapidsai/jupyterlab-nvdashboard>`__.


Expand All @@ -89,7 +89,7 @@ Use RMM

Memory allocations in cuDF are significantly faster and more efficient when
the `RAPIDS Memory Manager (RMM) <https://docs.rapids.ai/api/rmm/stable/>`__
library is used on worker processes. In most cases, the best way to manage
library is configured appropriately on worker processes. In most cases, the best way to manage
memory is by initializing an RMM pool on each worker before executing a
workflow. When using :func:`LocalCUDACluster`, this is easily accomplished
by setting ``rmm_pool_size`` to a large fraction (e.g. ``0.9``).
Expand All @@ -116,7 +116,7 @@ between the different DataFrame backends. For example::
.. note::
Although :func:`to_backend` makes it easy to move data between pandas
and cuDF, repetitive CPU-GPU data movement can degrade performance
significantly. For optimal results, keep your data on the GPU as often
significantly. For optimal results, keep your data on the GPU as much
as possible.

Avoid eager execution
Expand Down Expand Up @@ -275,10 +275,10 @@ for more details.
may lead to an OOM error.


Sorting, Joining and Grouping
-----------------------------
Sorting, Joining, and Grouping
------------------------------

Sorting, joining and grouping operations all have the potential to
Sorting, joining, and grouping operations all have the potential to
require the global shuffling of data between distinct partitions.
When the initial data fits comfortably in global GPU memory, these
"all-to-all" operations are typically bound by worker-to-worker
Expand Down
Loading