Self Checks
Dify version
Latest main branch
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
This is an architectural transaction management issue identified via code audit in api/core/rag/retrieval/dataset_retrieval.py. The execution path that triggers the vulnerability is as follows:
-
An agent triggers a RAG retrieval task, routing through either single_retrieve or multiple_retrieve.
-
During the retrieval process, the system calls the _on_query method to persist a DatasetQuery audit log for the operation.
-
Inside _on_query, the code executes db.session.add_all(...) immediately followed by an explicit, global db.session.commit().
-
If a subsequent step in the overarching workflow fails (e.g., vector database timeout, subsequent LLM generation failure, or sensitive word trigger), the top-level orchestrator catches the exception and attempts to execute db.session.rollback() to maintain data consistency.
✔️ Expected Behavior
The _on_query method should append audit logs without interfering with the main request's transaction lifecycle. To adhere to transaction isolation, it should write the DatasetQuery using a completely separate database connection/session or push the logging task to an async queue (like Celery). The global db.session should only be committed by the top-level controller once the entire workflow succeeds.
❌ Actual Behavior
The explicit db.session.commit() in the side-channel _on_query method forces a flush of all uncommitted, pending "dirty" objects currently held in the Flask request's db.session.
This causes a "transaction split-brain". It violently truncates the atomic transaction boundary of the main business flow. If the workflow encounters an error downstream, the subsequent db.session.rollback() will fail to revert the system state because intermediate modifications (e.g., token deductions, partial node executions) have already been irreversibly committed to the physical database, destroying the agent's backtracking capabilities and leaving dirty data.
Self Checks
Dify version
Latest main branch
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
This is an architectural transaction management issue identified via code audit in api/core/rag/retrieval/dataset_retrieval.py. The execution path that triggers the vulnerability is as follows:
An agent triggers a RAG retrieval task, routing through either single_retrieve or multiple_retrieve.
During the retrieval process, the system calls the _on_query method to persist a DatasetQuery audit log for the operation.
Inside _on_query, the code executes db.session.add_all(...) immediately followed by an explicit, global db.session.commit().
If a subsequent step in the overarching workflow fails (e.g., vector database timeout, subsequent LLM generation failure, or sensitive word trigger), the top-level orchestrator catches the exception and attempts to execute db.session.rollback() to maintain data consistency.
✔️ Expected Behavior
The _on_query method should append audit logs without interfering with the main request's transaction lifecycle. To adhere to transaction isolation, it should write the DatasetQuery using a completely separate database connection/session or push the logging task to an async queue (like Celery). The global db.session should only be committed by the top-level controller once the entire workflow succeeds.
❌ Actual Behavior
The explicit db.session.commit() in the side-channel _on_query method forces a flush of all uncommitted, pending "dirty" objects currently held in the Flask request's db.session.
This causes a "transaction split-brain". It violently truncates the atomic transaction boundary of the main business flow. If the workflow encounters an error downstream, the subsequent db.session.rollback() will fail to revert the system state because intermediate modifications (e.g., token deductions, partial node executions) have already been irreversibly committed to the physical database, destroying the agent's backtracking capabilities and leaving dirty data.