You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/dask.rst
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -87,10 +87,8 @@ for the full disclaimer). By default, :py:meth:`~xarray.open_mfdataset` will chu
87
87
netCDF file into a single Dask array; again, supply the ``chunks`` argument to
88
88
control the size of the resulting Dask arrays. In more complex cases, you can
89
89
open each file individually using :py:meth:`~xarray.open_dataset` and merge the result, as
90
-
described in :ref:`combining data`. If you have a distributed cluster running,
91
-
passing the keyword argument ``parallel=True`` to :py:meth:`~xarray.open_mfdataset`
92
-
will speed up the reading of large multi-file datasets by executing those read tasks
93
-
in parallel using ``dask.delayed``.
90
+
described in :ref:`combining data`. Passing the keyword argument ``parallel=True`` to :py:meth:`~xarray.open_mfdataset` will speed up the reading of large multi-file datasets by
91
+
executing those read tasks in parallel using ``dask.delayed``.
94
92
95
93
You'll notice that printing a dataset still shows a preview of array values,
96
94
even if they are actually Dask arrays. We can do this quickly with Dask because
@@ -157,6 +155,12 @@ explicit conversion step. One notable exception is indexing operations: to
157
155
enable label based indexing, xarray will automatically load coordinate labels
158
156
into memory.
159
157
158
+
.. tip::
159
+
160
+
By default, dask uses its multi-threaded scheduler, which distributes work across
161
+
multiple cores and allows for processing some datasets that do not fit into memory.
162
+
For running across a cluster, `setup the distributed scheduler <https://docs.dask.org/en/latest/setup.html>`_.
163
+
160
164
The easiest way to convert an xarray data structure from lazy Dask arrays into
161
165
*eager*, in-memory NumPy arrays is to use the :py:meth:`~xarray.Dataset.load` method:
162
166
@@ -417,7 +421,3 @@ With analysis pipelines involving both spatial subsetting and temporal resamplin
417
421
418
422
6. The dask `diagnostics <https://docs.dask.org/en/latest/understanding-performance.html>`_ can be
419
423
useful in identifying performance bottlenecks.
420
-
421
-
7. Installing the optional `bottleneck <https://github.com/kwgoodman/bottleneck>`_ library
422
-
will result in greatly reduced memory usage when using :py:meth:`~xarray.Dataset.rolling`
0 commit comments