Skip to content

fix(api): "File validation failed" on Chatflow follow-up with custom file type + memory#35891

Merged
wylswz merged 6 commits into
langgenius:mainfrom
lin-snow:fix/chatflow-history-file-revalidation
May 11, 2026
Merged

fix(api): "File validation failed" on Chatflow follow-up with custom file type + memory#35891
wylswz merged 6 commits into
langgenius:mainfrom
lin-snow:fix/chatflow-history-file-revalidation

Conversation

@lin-snow
Copy link
Copy Markdown
Contributor

@lin-snow lin-snow commented May 7, 2026

Summary

A Chatflow whose LLM node has memory enabled and File Upload set to Other file types (CUSTOM) fails on the second turn with File validation failed for file: <name>, even when no new file is uploaded.

Root cause. A file uploaded into the CUSTOM slot is coerced to its detected type by _resolve_file_type (PNG → IMAGE), and MessageFile.type persists the resolved type. On history replay, build_from_message_file rebuilds mapping["type"] from MessageFile.type, so a file that passed round 1 (mapping["type"]=="custom") is rejected on round 2 (mapping["type"]=="image"). The validator only bypassed the type gate for literal "custom", not for "config has CUSTOM as a fallback bucket".

A parallel mismatch was reachable on round 1: the extension whitelist used a raw in check, so a user-typed list like [".PNG", "png", "JPG", ...] failed to match the upload-side ".png" (always lowercased with leading dot).

Fix.

  • Refactor is_file_valid_with_config to bucket semantics. CUSTOM is a fallback bucket gated by allowed_file_extensions, compared case- and dot-insensitively. Empty whitelist while in the CUSTOM bucket continues to deny (defensive against DSL/API paths that bypass the UI).
  • Skip re-validation when rehydrating files from conversation history in TokenBufferMemory and BaseAgentRunner, mirroring the build_file_from_stored_mapping pattern. Validation belongs at upload time, not on replay.

Validator behavior matrix

allowed_file_types input_file_type allowed_file_extensions file_extension before after
[CUSTOM] custom (round 1) [".png"] .png
[CUSTOM] image (replay) [".png"] .png
[CUSTOM] custom [".PNG", "png", "JPG"] .png
[IMAGE, CUSTOM] document [".pdf"] .pdf
[CUSTOM] custom [] .png ✗ (defensive)
[IMAGE] video .mp4
any any any any (TOOL_FILE)

Bold rows are the user-visible fixes. The empty-whitelist row preserves the original deny-on-empty posture for paths that bypass the UI.

Screenshots

N/A — backend-only change.

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 7, 2026
@lin-snow lin-snow force-pushed the fix/chatflow-history-file-revalidation branch from 48bb3af to 424ed80 Compare May 7, 2026 10:51
@lin-snow lin-snow changed the title fix(api): keep Chatflow custom-type files valid on history replay fix(api): Chatflow follow-up rejects historical files when allowed_file_types=[custom] May 7, 2026
@lin-snow lin-snow changed the title fix(api): Chatflow follow-up rejects historical files when allowed_file_types=[custom] fix(api): "File validation failed" on Chatflow follow-up with custom file type + memory May 7, 2026
@lin-snow lin-snow self-assigned this May 7, 2026
@lin-snow lin-snow requested a review from wylswz May 7, 2026 11:03
Comment thread api/core/memory/token_buffer_memory.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a chatflow regression where file inputs that were accepted on the first turn could fail validation on subsequent turns when conversation history is replayed (notably with CUSTOM/“Other file types” + memory enabled). The change updates backend file validation to treat CUSTOM as an extension-gated fallback bucket and prevents re-validating persisted/history files during prompt reconstruction.

Changes:

  • Refactor is_file_valid_with_config to implement “bucket semantics” and normalize extension allowlists case-/dot-insensitively.
  • Skip file re-validation on history replay paths in TokenBufferMemory and BaseAgentRunner by passing config=None into message-file rehydration.
  • Add unit tests covering the new validation semantics and the “no re-validation on replay” contract.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
api/factories/file_factory/validation.py Updates file validation logic (CUSTOM fallback bucket + normalized extension matching).
api/core/memory/token_buffer_memory.py Prevents history replay from re-validating persisted files by rehydrating with config=None.
api/core/agent/base_agent_runner.py Aligns agent history/user prompt reconstruction with “no re-validation on replay”.
api/tests/unit_tests/factories/test_file_validation.py Adds unit tests for new validation/bucketing + extension normalization behavior.
api/tests/unit_tests/core/memory/test_token_buffer_memory.py Adds a unit test asserting replay calls build_from_message_file(..., config=None).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/factories/file_factory/validation.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Pyrefly Type Coverage

Metric Base PR Delta
Type coverage 0.00% 43.75% +43.75%
Strict coverage 0.00% 43.27% +43.27%
Typed symbols 0 21,996 +21,996
Untyped symbols 0 28,597 +28,597
Modules 0 2548 +2,548

@lin-snow lin-snow force-pushed the fix/chatflow-history-file-revalidation branch 4 times, most recently from c8b14a2 to 84d02de Compare May 9, 2026 03:57
@wylswz wylswz added this to the 1.14.1 milestone May 11, 2026
lin-snow and others added 6 commits May 11, 2026 09:49
A Chatflow file uploaded into the CUSTOM type slot is coerced to its
detected type by _resolve_file_type (PNG -> IMAGE), and MessageFile.type
persists that resolved type. On history replay, build_from_message_file
rebuilds mapping["type"] from MessageFile.type, so a file that passed
round 1 (mapping["type"]=="custom") was rejected on round 2
(mapping["type"]=="image") even though the workflow config was unchanged.

- Refactor is_file_valid_with_config with bucket semantics: CUSTOM acts
  as a fallback bucket gated by allowed_file_extensions, compared case-
  and dot-insensitively. This also fixes a parallel mismatch where a
  user whitelist of [".PNG", "png", "JPG", ...] failed to match the
  upload-side ".png" (always lowercase with leading dot).
- Skip re-validation when rehydrating files from conversation history in
  TokenBufferMemory and BaseAgentRunner; history files were validated at
  upload time, mirroring build_file_from_stored_mapping.
Follow-up to the prior fix. The bucket-semantics rewrite changed the
extension-whitelist guard from `is not None` to truthiness, which
silently widened behavior for the empty-list case (UI never submits it,
but DSL / API paths could). Restore the original deny-on-empty
posture: when a file falls into the CUSTOM bucket, an explicitly set
whitelist (including []) is authoritative.

Also tightens _normalize_extension so whitespace-only input returns ""
consistent with empty input, and locks two contracts with tests:

- empty whitelist + CUSTOM bucket rejects (regression guard for the
  silent widening)
- TokenBufferMemory passes config=None to build_from_message_file
  (regression guard for the replay-skips-validation contract)
A whitelist with an empty / whitespace entry (e.g. a stray comma in DSL)
combined with an extensionless file would spuriously match — both sides
normalize to "" and pass. Filter empty normalized whitelist entries and
short-circuit when the input extension itself normalizes to empty, so
invalid whitelist entries can't widen the allowlist.

Reported by Copilot on PR review.
The walrus filter was redundant given the early return on empty input:
empty whitelist entries normalize to "" and can never match a non-empty
input extension, and empty input is already rejected upfront.
Both helpers in factories/file_factory/message_files.py are only invoked
from replay paths that intentionally skip re-validation, so the config
argument was always None. Remove it from the signatures and update the
two call sites; module docstring records the design intent.
@lin-snow lin-snow force-pushed the fix/chatflow-history-file-revalidation branch from 84d02de to 185121b Compare May 11, 2026 01:49
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 11, 2026
@wylswz wylswz enabled auto-merge May 11, 2026 01:52
@wylswz wylswz added this pull request to the merge queue May 11, 2026
Merged via the queue into langgenius:main with commit e8dc706 May 11, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants