[trainer, worker] fix: Pass a dict config to ray to avoid pickling issues with Deepseek V3.1's tokenizer #4808

jthomy · 2026-01-06T04:05:02Z

What does this PR do?

Do not pass the instantiated tokenizer in the config via ray. This commit solves an issue with Deepseek V3.1's tokenizer in the SFT ray trainer.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Deepseek V3.1 successfully instantiates and trains with the SFT Ray trainer.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

No API changes.

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Avoid instantiating the tokenizer before passing the resulting object to ray remote.
TrainingWorkerConfig now contains a union type which may be debatable.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
[] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: Deepseek is pretty large and the fix is very small.
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

…'s tokenizer

gemini-code-assist

Code Review

This pull request effectively resolves a pickling issue with the Deepseek V3.1 tokenizer in Ray by deferring the instantiation of the model configuration until it's inside the worker process. This is achieved by passing a DictConfig instead of the instantiated dataclass. The implementation is sound and correctly addresses the problem. I have one suggestion to ensure type hint consistency and maintain compatibility with older Python versions.

gemini-code-assist · 2026-01-06T04:06:17Z

verl/workers/config/engine.py

 class TrainingWorkerConfig(BaseConfig):
    model_type: str = None  # model type (language_model/value_model)
-    model_config: HFModelConfig = None
+    model_config: HFModelConfig | DictConfig = None


The | operator for type hints was introduced in Python 3.10. To maintain consistency with the rest of the codebase, which uses types like Optional from the typing module, and to ensure compatibility with Python versions prior to 3.10, it is recommended to use Union from typing instead.

Please update the type hint as follows:

from typing import Any, Literal, Optional, Union ... model_config: Union[HFModelConfig, DictConfig] = None

You will need to add Union to the import from typing at the top of the file.

vermouth1992 · 2026-01-06T05:23:22Z

Good idea! Shall we introduce a defer load tokenizer option instead of modifying the config semantics directly?

vermouth1992 · 2026-01-08T13:36:42Z

Let me draft a PR that defer loading tokenizer

Pass a dict config to ray to avoid pickling issues with Deepseek V3.1…

da2a041

…'s tokenizer

jthomy requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners January 6, 2026 04:05

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer, worker] fix: Pass a dict config to ray to avoid pickling issues with Deepseek V3.1's tokenizer #4808

[trainer, worker] fix: Pass a dict config to ray to avoid pickling issues with Deepseek V3.1's tokenizer #4808

jthomy commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

vermouth1992 commented Jan 6, 2026

Uh oh!

vermouth1992 commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[trainer, worker] fix: Pass a dict config to ray to avoid pickling issues with Deepseek V3.1's tokenizer #4808

Are you sure you want to change the base?

[trainer, worker] fix: Pass a dict config to ray to avoid pickling issues with Deepseek V3.1's tokenizer #4808

Conversation

jthomy commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented Jan 6, 2026

Uh oh!

vermouth1992 commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jthomy commented Jan 6, 2026 •

edited

Loading