Skip to content

Commit 4e9b9f9

Browse files
amogkamangelinalg
andauthored
[Docs] [Data] Fix broken references (ray-project#36232)
Fixes a bunch of broken Ray data link references. --------- Signed-off-by: amogkam <amogkamsetty@yahoo.com> Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
1 parent c7651c4 commit 4e9b9f9

18 files changed

+66
-39
lines changed

doc/source/data/batch_inference.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Using Ray Data for offline inference involves four basic steps:
2929
- **Step 1:** Load your data into a Ray Dataset. Ray Data supports many different data sources and formats. For more details, see :ref:`Loading Data <loading_data>`.
3030
- **Step 2:** Define a Python class to load the pre-trained model.
3131
- **Step 3:** Transform your dataset using the pre-trained model by calling :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`. For more details, see :ref:`Transforming Data <transforming-data>`.
32-
- **Step 4:** Get the final predictions by either iterating through the output or saving the results. For more details, see :ref:`Consuming data <consuming_data>`.
32+
- **Step 4:** Get the final predictions by either iterating through the output or saving the results. For more details, see the :ref:`Iterating over data <iterating-over-data>` and :ref:`Saving data <saving-data>` user guides.
3333

3434
For more in-depth examples for your use case, see :ref:`batch_inference_examples`_. For how to configure batch inference, see :ref:`batch_inference_configuration`_.
3535

@@ -365,7 +365,7 @@ Increasing batch size results in faster execution because inference is a vectori
365365
Handling GPU out-of-memory failures
366366
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
367367

368-
If you run into CUDA out-of-memory issues, your batch size is likely too large. Decrease the batch size by following :ref:`these steps <_batch_inference_batch_size>`.
368+
If you run into CUDA out-of-memory issues, your batch size is likely too large. Decrease the batch size by following :ref:`these steps <batch_inference_batch_size>`.
369369

370370
If your batch size is already set to 1, then use either a smaller model or GPU devices with more memory.
371371

doc/source/data/examples/batch_training.ipynb

+3-1
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,7 @@
453453
]
454454
},
455455
{
456+
"attachments": {},
456457
"cell_type": "markdown",
457458
"metadata": {},
458459
"source": [
@@ -726,13 +727,14 @@
726727
]
727728
},
728729
{
730+
"attachments": {},
729731
"cell_type": "markdown",
730732
"metadata": {},
731733
"source": [
732734
"Recall how we wrote a data transform `transform_batch` UDF? It was called with pattern:\n",
733735
"- `Dataset.map_batches(transform_batch, batch_format=\"pandas\")`\n",
734736
"\n",
735-
"Similarly, we can write a custom groupy-aggregate function `agg_func` which will run for each [Dataset *group-by*](data-groupbys) group in parallel. The usage pattern is:\n",
737+
"Similarly, we can write a custom groupy-aggregate function `agg_func` which will run for each [Dataset *group-by*](transform_groupby) group in parallel. The usage pattern is:\n",
736738
"- `Dataset.groupby(column).map_groups(agg_func, batch_format=\"pandas\")`.\n",
737739
"\n",
738740
"In the cell below, we define our custom `agg_func`."

doc/source/data/examples/custom-datasource.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Implementing a Custom Datasource
77
.. note::
88

99
This MongoDatasource guide below is for education only. For production use of MongoDB
10-
in Ray Data, see :ref:`Creating Dataset from MongoDB <dataset_mongo_db>`.
10+
in Ray Data, see :ref:`Creating Dataset from MongoDB <reading_mongodb>`.
1111

1212
Ray Data supports multiple ways to :ref:`create a dataset <loading_data>`,
1313
allowing you to easily ingest data of common formats from popular sources. However, if the
@@ -101,7 +101,7 @@ First, let's handle a single MongoDB pipeline, which is the unit of execution in
101101
and then convert results into Arrow format. We use ``PyMongo`` and ``PyMongoArrow``
102102
to achieve this.
103103

104-
.. literalinclude:: ./doc_code/custom_datasource.py
104+
.. literalinclude:: ../doc_code/custom_datasource.py
105105
:language: python
106106
:start-after: __read_single_partition_start__
107107
:end-before: __read_single_partition_end__
@@ -121,7 +121,7 @@ a wrapper of ``_read_single_partition``.
121121
A list of :class:`~ray.data.ReadTask` objects are returned by ``get_read_tasks``, and these
122122
tasks are executed on remote workers. You can find more details about `Dataset read execution here <https://docs.ray.io/en/master/data/key-concepts.html#reading-data>`__.
123123

124-
.. literalinclude:: ./doc_code/custom_datasource.py
124+
.. literalinclude:: ../doc_code/custom_datasource.py
125125
:language: python
126126
:start-after: __mongo_datasource_reader_start__
127127
:end-before: __mongo_datasource_reader_end__
@@ -136,7 +136,7 @@ Write support
136136
Similar to read support, we start with handling a single block. Again
137137
the ``PyMongo`` and ``PyMongoArrow`` are used for MongoDB interactions.
138138

139-
.. literalinclude:: ./doc_code/custom_datasource.py
139+
.. literalinclude:: ../doc_code/custom_datasource.py
140140
:language: python
141141
:start-after: __write_single_block_start__
142142
:end-before: __write_single_block_end__
@@ -150,7 +150,7 @@ will later be used in the implementation of :meth:`~ray.data.Datasource.do_write
150150
In short, the below function spawns multiple :ref:`Ray remote tasks <ray-remote-functions>`
151151
and returns :ref:`their futures (object refs) <objects-in-ray>`.
152152

153-
.. literalinclude:: ./doc_code/custom_datasource.py
153+
.. literalinclude:: ../doc_code/custom_datasource.py
154154
:language: python
155155
:start-after: __write_multiple_blocks_start__
156156
:end-before: __write_multiple_blocks_end__
@@ -164,7 +164,7 @@ ready to implement :meth:`create_reader() <ray.data.Datasource.create_reader>`
164164
and :meth:`do_write() <ray.data.Datasource.do_write>`, and put together
165165
a ``MongoDatasource``.
166166

167-
.. literalinclude:: ./doc_code/custom_datasource.py
167+
.. literalinclude:: ../doc_code/custom_datasource.py
168168
:language: python
169169
:start-after: __mongo_datasource_start__
170170
:end-before: __mongo_datasource_end__

doc/source/data/examples/nyc_taxi_basic_processing.ipynb

+2-1
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,7 @@
595595
]
596596
},
597597
{
598+
"attachments": {},
598599
"cell_type": "markdown",
599600
"id": "0d1e2106",
600601
"metadata": {},
@@ -604,7 +605,7 @@
604605
"Note that Ray Data' Parquet reader supports projection (column selection) and row filter pushdown, where we can push the above column selection and the row-based filter to the Parquet read. If we specify column selection at Parquet read time, the unselected columns won't even be read from disk!\n",
605606
"\n",
606607
"The row-based filter is specified via\n",
607-
"[Arrow's dataset field expressions](https://arrow.apache.org/docs/6.0/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression). See the {ref}`feature guide for reading Parquet data <dataset_supported_file_formats>` for more information."
608+
"[Arrow's dataset field expressions](https://arrow.apache.org/docs/6.0/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression). See the {ref}`Parquet row pruning tips <parquet_row_pruning>` for more information."
608609
]
609610
},
610611
{

doc/source/data/examples/ocr_example.ipynb

+8-1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
]
2222
},
2323
{
24+
"attachments": {},
2425
"cell_type": "markdown",
2526
"id": "2a344178",
2627
"metadata": {},
@@ -78,7 +79,7 @@
7879
"\n",
7980
"### Running the OCR software on the data\n",
8081
"\n",
81-
"We can now use the {meth}`ray.data.read_binary_files <ray.data.read_binary_files>` function to read all the images from S3. We set the `include_paths=True` option to create a dataset of the S3 paths and image contents. We then run the {meth}`ds.map <ray.data.Dataset.map>` function on this dataset to execute the actual OCR process on each file and convert the screen shots into text. This will create a tabular dataset with columns `path` and `text`, see also [](transforming_data).\n",
82+
"We can now use the {meth}`ray.data.read_binary_files <ray.data.read_binary_files>` function to read all the images from S3. We set the `include_paths=True` option to create a dataset of the S3 paths and image contents. We then run the {meth}`ds.map <ray.data.Dataset.map>` function on this dataset to execute the actual OCR process on each file and convert the screen shots into text. This creates a tabular dataset with columns `path` and `text`.\n",
8283
"\n",
8384
"````{note}\n",
8485
"If you want to load the data from a private bucket, you have to run\n",
@@ -317,6 +318,12 @@
317318
"\n",
318319
"Contributions that extend the example in this direction with a PR are welcome!"
319320
]
321+
},
322+
{
323+
"cell_type": "markdown",
324+
"id": "582546c8",
325+
"metadata": {},
326+
"source": []
320327
}
321328
],
322329
"metadata": {

doc/source/data/faq.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,10 @@ What should I not use Ray Data for?
8080

8181
Ray Data is not meant to be used for generic ETL pipelines (like Spark) or
8282
scalable data science (like Dask, Modin, or Mars). However, each of these frameworks
83-
are :ref:`runnable on Ray <data_integrations>`, and Datasets integrates tightly with
83+
are runnable on Ray, and Datasets integrates tightly with
8484
these frameworks, allowing for efficient exchange of distributed data partitions often
8585
with zero-copy. Check out the
86-
:ref:`dataset creation feature guide <dataset_from_in_memory_data_distributed>` to learn
86+
:ref:`dataset creation feature guide <loading_datasets_from_distributed_df>` to learn
8787
more about these integrations.
8888

8989
Datasets is specifically targeting

doc/source/data/getting-started.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ To learn more about creating datasets, read
4545
Transform the dataset
4646
------------------------
4747

48-
Apply :ref:`user-defined functions <transform_datasets_writing_udfs>` (UDFs) to
48+
Apply user-defined functions (UDFs) to
4949
transform datasets. Ray executes transformations in parallel for performance.
5050

5151
.. testcode::
@@ -135,7 +135,7 @@ Pass datasets to Ray tasks or actors, and access records with methods like
135135

136136

137137
To learn more about consuming datasets, read
138-
:ref:`Consuming data <consuming_data>`.
138+
:ref:`Iterating over Data <iterating-over-data>` and :ref:`Saving Data <saving-data>`.
139139

140140
Save the dataset
141141
-------------------

doc/source/data/inspecting-data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ a dictionary.
8686

8787

8888
For more information on working with rows, see
89-
:ref:`Transforming rows <transforming-rows>` and
89+
:ref:`Transforming rows <transforming_rows>` and
9090
:ref:`Iterating over rows <iterating-over-rows>`.
9191

9292
.. _inspecting-batches:
@@ -141,5 +141,5 @@ of the returned batch, set ``batch_format``.
141141
[2 rows x 5 columns]
142142

143143
For more information on working with batches, see
144-
:ref:`Transforming batches <transforming-batches>` and
144+
:ref:`Transforming batches <transforming_batches>` and
145145
:ref:`Iterating over batches <iterating-over-batches>`.

doc/source/data/iterating-over-data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ as a dictionary.
4040

4141

4242
For more information on working with rows, see
43-
:ref:`Transforming rows <transforming-rows>` and
43+
:ref:`Transforming rows <transforming_rows>` and
4444
:ref:`Inspecting rows <inspecting-rows>`.
4545

4646
.. _iterating-over-batches:
@@ -142,7 +142,7 @@ formats by calling one of the following methods:
142142
tf.Tensor([6.2 5.9], shape=(2,), dtype=float64) tf.Tensor([2 2], shape=(2,), dtype=int64)
143143

144144
For more information on working with batches, see
145-
:ref:`Transforming batches <transforming-batches>` and
145+
:ref:`Transforming batches <transforming_batches>` and
146146
:ref:`Inspecting batches <inspecting-batches>`.
147147

148148
.. _iterating-over-batches-with-shuffling:

doc/source/data/loading-data.rst

+6
Original file line numberDiff line numberDiff line change
@@ -444,6 +444,8 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
444444
schema={food: string, price: double}
445445
)
446446

447+
.. _loading_datasets_from_distributed_df:
448+
447449
Loading data from distributed DataFrame libraries
448450
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
449451

@@ -633,6 +635,8 @@ Reading databases
633635

634636
Ray Data reads from databases like MySQL, Postgres, and MongoDB.
635637

638+
.. _reading_sql:
639+
636640
Reading SQL databases
637641
~~~~~~~~~~~~~~~~~~~~~
638642

@@ -828,6 +832,8 @@ Call :func:`~ray.data.read_sql` to read data from a database that provides a
828832
"SELECT year, COUNT(*) FROM movie GROUP BY year", create_connection
829833
)
830834

835+
.. _reading_mongodb:
836+
831837
Reading MongoDB
832838
~~~~~~~~~~~~~~~
833839

doc/source/data/performance-tips.rst

+2
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,8 @@ avoid loading unnecessary data (projection pushdown).
108108
For example, use ``ray.data.read_parquet("example://iris.parquet", columns=["sepal.length", "variety"])`` to read
109109
just two of the five columns of Iris dataset.
110110

111+
.. _parquet_row_pruning:
112+
111113
Parquet Row Pruning
112114
~~~~~~~~~~~~~~~~~~~
113115

doc/source/data/transforming-data.rst

+13-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _transforming-data:
1+
.. _transforming_data:
22

33
=================
44
Transforming Data
@@ -15,6 +15,8 @@ This guide shows you how to:
1515
* `Shuffle rows <#shuffling-rows>`_
1616
* `Repartition data <#repartitioning-data>`_
1717

18+
.. _transforming_rows:
19+
1820
Transforming rows
1921
=================
2022

@@ -71,6 +73,8 @@ If your transformation returns multiple rows for each input row, call
7173

7274
[{'id': 0}, {'id': 0}, {'id': 1}, {'id': 1}, {'id': 2}, {'id': 2}]
7375

76+
.. _transforming_batches:
77+
7478
Transforming batches
7579
====================
7680

@@ -108,6 +112,8 @@ uses tasks by default.
108112
.map_batches(increase_brightness)
109113
)
110114

115+
.. _transforming_data_actors:
116+
111117
Transforming batches with actors
112118
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113119

@@ -191,8 +197,10 @@ To transform batches with actors, complete these steps:
191197

192198
ds.materialize()
193199

194-
Configuring batch type
195-
~~~~~~~~~~~~~~~~~~~~~~
200+
.. _configure_batch_format:
201+
202+
Configuring batch format
203+
~~~~~~~~~~~~~~~~~~~~~~~~
196204

197205
Ray Data represents batches as dicts of NumPy ndarrays or pandas DataFrames. By
198206
default, Ray Data represents batches as dicts of NumPy ndarrays.
@@ -248,6 +256,8 @@ program might run out of memory. If you encounter an out-of-memory error, decrea
248256
the default batch size is 4096. If you're using GPUs, you must specify an explicit
249257
batch size.
250258

259+
.. _transforming_groupby:
260+
251261
Groupby and transforming groups
252262
===============================
253263

doc/source/ray-air/computer-vision.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Reading image data
3838
:end-before: __read_images1_stop__
3939
:dedent:
4040

41-
Then, apply a :ref:`user-defined function <transform_datasets_writing_udfs>` to
41+
Then, apply a :ref:`user-defined function <transforming_data>` to
4242
encode the class names as integer targets.
4343

4444
.. literalinclude:: ./doc_code/computer_vision.py
@@ -98,7 +98,7 @@ Reading image data
9898
:end-before: __read_tfrecords1_stop__
9999
:dedent:
100100

101-
Then, apply a :ref:`user-defined function <transform_datasets_writing_udfs>` to
101+
Then, apply a :ref:`user-defined function <transforming_data>` to
102102
decode the raw image bytes.
103103

104104
.. literalinclude:: ./doc_code/computer_vision.py

doc/source/ray-core/patterns/pipelining.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ you can use the `pipelining <https://en.wikipedia.org/wiki/Pipeline_(computing)>
77
.. note::
88

99
Pipelining is an important technique to improve the performance and is heavily used by Ray libraries.
10-
See :ref:`DatasetPipelines <pipelining_datasets>` as an example.
10+
See :ref:`Ray Data <data>` as an example.
1111

1212
.. figure:: ../images/pipelining.svg
1313

0 commit comments

Comments
 (0)