Skip to content

Conversation

@jeffreywang-anyscale
Copy link
Contributor

Description

  1. Add resiliency section to explain row-level and actor-level fault tolerance and the checkpointing feature
  2. Restore VLM / omni model batch inference examples removed by [docs][data][llm] Batch inference docs reorg + update to reflect per-stage config refactor #59214
  3. Adjust doc code examples to align with master's behavior (e.g. prefer chat_template_stage=True over apply_chat_template=True)

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@jeffreywang-anyscale jeffreywang-anyscale requested review from a team as code owners January 29, 2026 23:48
@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Jan 29, 2026
@jeffreywang-anyscale
Copy link
Contributor Author

New resiliency section:
Screenshot 2026-01-29 at 4 03 43 PM

…behavior

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

ds = ray.data.read_parquet(input_path)
ds = processor(ds)
ds.write_parquet(output_path)
# __checkpoint_usage_example_end__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoint demo runs during module import

High Severity

The new checkpoint example executes at import time: it deletes and recreates /tmp/llm_checkpoint_demo/*, sets global ray.data.DataContext checkpoint config, then calls ray.data.read_parquet(input_path) and write_parquet(output_path) without creating any input data. This can fail CI/docs builds and introduces unexpected filesystem and global state side effects.

Fix in Cursor Fix in Web

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues llm labels Jan 30, 2026
@jeffreywang-anyscale jeffreywang-anyscale changed the title [data][llm][doc] Add in resiliency section and adjust doc code to align with master's behavior [data][llm][doc] Add in resiliency section and refine doc code Jan 30, 2026

# __checkpoint_usage_example_start__
processor_config = vLLMEngineProcessorConfig(
model_source="Qwen/Qwen3-0.6B",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_source="Qwen/Qwen3-0.6B",
model_source="unsloth/Llama-3.1-8B-Instruct",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants