[Fix][Core] Fix SeaTunnel CLI reasoning replay for tool calls by yzeng1618 · Pull Request #10902 · apache/seatunnel

yzeng1618 · 2026-05-19T07:29:28Z

Purpose of this pull request

This pull request fixes SeaTunnel CLI behavior when using OpenAI-compatible reasoning models and planner-generated structured plans.

The main fix is for OpenAI-compatible thinking models that require reasoning_content to be passed back on assistant tool-call replay messages. Previously, SeaTunnel CLI only sent reasoning_content when non-empty reasoning text was collected. For some compatible APIs, the next tool-call request could fail with:

The reasoning_content in the thinking mode must be passed back to the API.

This patch always includes reasoning_content for OpenAI-compatible assistant tool-call replay messages when OPENAI_ECHO_REASONING_CONTENT is enabled.

This pull request also handles planner responses where tables is null. The CLI now normalizes tables: null to an empty list, which avoids runtime failures for sources that do not require table names, such as FakeSource.

Does this PR introduce any user-facing change?

Yes.

For SeaTunnel CLI users using OpenAI-compatible reasoning models, multi-turn tool-call conversations no longer fail with a 400 error requiring reasoning_content replay.

For planner-generated structured plans with tables: null, the CLI now treats the value as an empty table list instead of failing during pipeline expansion.

This change affects SeaTunnel CLI behavior on the dev branch.

How was this patch tested?

This patch was tested with SeaTunnel CLI unit tests

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR.
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

DanielLeens

Thanks for working on this. I reviewed the full diff on the latest head (67ad9dba6c81bdd2324c17551ec27257426509f2), traced the planner/config/fix tool-call loop locally, and the overall direction makes sense. Preserving provider-specific reasoning/thinking state across tool calls is the right fix. That said, I still see one blocker in the current OpenAI replay path.

What this PR fixes

User pain: after a tool call, models that require reasoning-state replay can reject the next request because the CLI used to drop that provider-specific state.
Fix approach: llm_provider.py now preserves OpenAI reasoning_content, Anthropic thinking/signature blocks, and Bedrock reasoning blocks in the internal message format and replays them on the next round.
One-line summary: the abstraction is moving in the right direction, but the latest implementation still replays reasoning_content too aggressively on the normal OpenAI tool-call path.

Runtime / execution chain I checked

Orchestrator tool loop
  -> client.chat_stream(...)
  -> LLMProvider.collect_stream(events) reconstructs assistant_content
  -> assistant_content is appended to conversation history
  -> next round calls OpenAIProvider._to_openai_messages(...)
  -> assistant tool-call history is serialized back to OpenAI-compatible chat messages

Issue 1: empty reasoning_content is replayed unconditionally for assistant tool-call history

Location: seatunnel-cli/seatunnel_cli/llm_provider.py:795-819
Why this is a problem:
- In the has_tool_use branch, the latest code sets msg_dict["reasoning_content"] whenever OPENAI_ECHO_REASONING_CONTENT is enabled, even if reasoning_parts is empty.
- That is inconsistent with the regular assistant-text branch in the same file (seatunnel-cli/seatunnel_cli/llm_provider.py:831-839), which only replays the field when reasoning content actually exists.
- So on a normal tool-call session where the previous assistant message did not return reasoning_content, the next request still sends reasoning_content: "".
Risk:
- This is on the main tool-call path, not a corner case.
- Strict OpenAI-compatible endpoints can reject the next request because the history now contains a non-standard / empty reasoning field that was never present in the original assistant output.
Better fix:
- Only add reasoning_content in the has_tool_use branch when reasoning_parts is non-empty, exactly like the regular assistant branch already does.

Issue 2: the new replay logic still has no regression tests

Location: seatunnel-cli/seatunnel_cli/llm_provider.py:168-309, 766-891
Why this matters:
- This PR changes the core reconstruction / replay logic for three providers, but there is still no automated coverage for:
  - tool-call history with no reasoning blocks,
  - tool-call history with real reasoning blocks,
  - Anthropic / Bedrock reasoning round-trips.
I would strongly recommend adding a few focused unit tests around collect_stream(), _to_openai_messages(), and _from_openai_response() so this does not regress again.

Merge conclusion

Conclusion: fix before merge.
Blocking item: Issue 1.
Non-blocking follow-up: add the regression tests from Issue 2.

I like the general shape of the fix, and the error-message improvements in cli.py are helpful. Once the OpenAI tool-call branch only replays reasoning_content when it actually exists, I am happy to re-review quickly.

zengyi added 2 commits May 18, 2026 18:27

[Fix][Core] Preserve reasoning context for SeaTunnel CLI LLM providers

b6921e0

[Fix][Core] Fix SeaTunnel CLI reasoning replay for tool calls

67ad9db

yzeng1618 requested a review from SEZ9 May 19, 2026 07:30

DanielLeens suggested changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Core] Fix SeaTunnel CLI reasoning replay for tool calls#10902

[Fix][Core] Fix SeaTunnel CLI reasoning replay for tool calls#10902
yzeng1618 wants to merge 2 commits into
apache:devfrom
yzeng1618:dev-seatunnel-cli

yzeng1618 commented May 19, 2026

Uh oh!

DanielLeens left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yzeng1618 commented May 19, 2026

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

DanielLeens left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants