[Bug]: Premature db.session.commit() in DatasetRetrieval._on_query breaks transaction isolation and invalidates rollbacks

### Self Checks

- [x] I have read the [Contributing Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) and [Language Policy](https://github.com/langgenius/dify/issues/1542).
- [x] This is only for bug report, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
- [x] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] 【中文用户 & Non English User】请使用英语提交，否则会被关闭 ：）
- [x] Please do not modify this template :) and fill in all the required fields.

### Dify version

Latest main branch

### Cloud or Self Hosted

Self Hosted (Source)

### Steps to reproduce

This is an architectural transaction management issue identified via code audit in api/core/rag/retrieval/dataset_retrieval.py. The execution path that triggers the vulnerability is as follows:

1. An agent triggers a RAG retrieval task, routing through either single_retrieve or multiple_retrieve.

2. During the retrieval process, the system calls the _on_query method to persist a DatasetQuery audit log for the operation.

3. Inside _on_query, the code executes db.session.add_all(...) immediately followed by an explicit, global db.session.commit().

4. If a subsequent step in the overarching workflow fails (e.g., vector database timeout, subsequent LLM generation failure, or sensitive word trigger), the top-level orchestrator catches the exception and attempts to execute db.session.rollback() to maintain data consistency.

### ✔️ Expected Behavior

The _on_query method should append audit logs without interfering with the main request's transaction lifecycle. To adhere to transaction isolation, it should write the DatasetQuery using a completely separate database connection/session or push the logging task to an async queue (like Celery). The global db.session should only be committed by the top-level controller once the entire workflow succeeds.

### ❌ Actual Behavior

The explicit db.session.commit() in the side-channel _on_query method forces a flush of all uncommitted, pending "dirty" objects currently held in the Flask request's db.session.

This causes a "transaction split-brain". It violently truncates the atomic transaction boundary of the main business flow. If the workflow encounters an error downstream, the subsequent db.session.rollback() will fail to revert the system state because intermediate modifications (e.g., token deductions, partial node executions) have already been irreversibly committed to the physical database, destroying the agent's backtracking capabilities and leaving dirty data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Premature db.session.commit() in DatasetRetrieval._on_query breaks transaction isolation and invalidates rollbacks #37886

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Premature db.session.commit() in DatasetRetrieval._on_query breaks transaction isolation and invalidates rollbacks #37886

Description

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions