You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes a bunch of broken Ray data link references.
---------
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Copy file name to clipboardexpand all lines: doc/source/data/batch_inference.rst
+2-2
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ Using Ray Data for offline inference involves four basic steps:
29
29
- **Step 1:** Load your data into a Ray Dataset. Ray Data supports many different data sources and formats. For more details, see :ref:`Loading Data <loading_data>`.
30
30
- **Step 2:** Define a Python class to load the pre-trained model.
31
31
- **Step 3:** Transform your dataset using the pre-trained model by calling :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`. For more details, see :ref:`Transforming Data <transforming-data>`.
32
-
- **Step 4:** Get the final predictions by either iterating through the output or saving the results. For more details, see :ref:`Consuming data <consuming_data>`.
32
+
- **Step 4:** Get the final predictions by either iterating through the output or saving the results. For more details, see the :ref:`Iterating over data <iterating-over-data>` and :ref:`Saving data <saving-data>` user guides.
33
33
34
34
For more in-depth examples for your use case, see :ref:`batch_inference_examples`_. For how to configure batch inference, see :ref:`batch_inference_configuration`_.
35
35
@@ -365,7 +365,7 @@ Increasing batch size results in faster execution because inference is a vectori
365
365
Handling GPU out-of-memory failures
366
366
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
367
367
368
-
If you run into CUDA out-of-memory issues, your batch size is likely too large. Decrease the batch size by following :ref:`these steps <_batch_inference_batch_size>`.
368
+
If you run into CUDA out-of-memory issues, your batch size is likely too large. Decrease the batch size by following :ref:`these steps <batch_inference_batch_size>`.
369
369
370
370
If your batch size is already set to 1, then use either a smaller model or GPU devices with more memory.
"Similarly, we can write a custom groupy-aggregate function `agg_func` which will run for each [Dataset *group-by*](data-groupbys) group in parallel. The usage pattern is:\n",
737
+
"Similarly, we can write a custom groupy-aggregate function `agg_func` which will run for each [Dataset *group-by*](transform_groupby) group in parallel. The usage pattern is:\n",
@@ -121,7 +121,7 @@ a wrapper of ``_read_single_partition``.
121
121
A list of :class:`~ray.data.ReadTask` objects are returned by ``get_read_tasks``, and these
122
122
tasks are executed on remote workers. You can find more details about `Dataset read execution here <https://docs.ray.io/en/master/data/key-concepts.html#reading-data>`__.
Copy file name to clipboardexpand all lines: doc/source/data/examples/nyc_taxi_basic_processing.ipynb
+2-1
Original file line number
Diff line number
Diff line change
@@ -595,6 +595,7 @@
595
595
]
596
596
},
597
597
{
598
+
"attachments": {},
598
599
"cell_type": "markdown",
599
600
"id": "0d1e2106",
600
601
"metadata": {},
@@ -604,7 +605,7 @@
604
605
"Note that Ray Data' Parquet reader supports projection (column selection) and row filter pushdown, where we can push the above column selection and the row-based filter to the Parquet read. If we specify column selection at Parquet read time, the unselected columns won't even be read from disk!\n",
605
606
"\n",
606
607
"The row-based filter is specified via\n",
607
-
"[Arrow's dataset field expressions](https://arrow.apache.org/docs/6.0/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression). See the {ref}`feature guide for reading Parquet data <dataset_supported_file_formats>` for more information."
608
+
"[Arrow's dataset field expressions](https://arrow.apache.org/docs/6.0/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression). See the {ref}`Parquet row pruning tips <parquet_row_pruning>` for more information."
Copy file name to clipboardexpand all lines: doc/source/data/examples/ocr_example.ipynb
+8-1
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,7 @@
21
21
]
22
22
},
23
23
{
24
+
"attachments": {},
24
25
"cell_type": "markdown",
25
26
"id": "2a344178",
26
27
"metadata": {},
@@ -78,7 +79,7 @@
78
79
"\n",
79
80
"### Running the OCR software on the data\n",
80
81
"\n",
81
-
"We can now use the {meth}`ray.data.read_binary_files <ray.data.read_binary_files>` function to read all the images from S3. We set the `include_paths=True` option to create a dataset of the S3 paths and image contents. We then run the {meth}`ds.map <ray.data.Dataset.map>` function on this dataset to execute the actual OCR process on each file and convert the screen shots into text. This will create a tabular dataset with columns `path` and `text`, see also [](transforming_data).\n",
82
+
"We can now use the {meth}`ray.data.read_binary_files <ray.data.read_binary_files>` function to read all the images from S3. We set the `include_paths=True` option to create a dataset of the S3 paths and image contents. We then run the {meth}`ds.map <ray.data.Dataset.map>` function on this dataset to execute the actual OCR process on each file and convert the screen shots into text. This creates a tabular dataset with columns `path` and `text`.\n",
82
83
"\n",
83
84
"````{note}\n",
84
85
"If you want to load the data from a private bucket, you have to run\n",
@@ -317,6 +318,12 @@
317
318
"\n",
318
319
"Contributions that extend the example in this direction with a PR are welcome!"
0 commit comments