Skip to content

fix(file_factory): drop doubled dot when standardizing datasource file extension#35808

Merged
fatelei merged 1 commit into
langgenius:mainfrom
Beandon13:fix/dify-datasource-double-dot
May 6, 2026
Merged

fix(file_factory): drop doubled dot when standardizing datasource file extension#35808
fatelei merged 1 commit into
langgenius:mainfrom
Beandon13:fix/dify-datasource-double-dot

Conversation

@Beandon13
Copy link
Copy Markdown
Contributor

Summary

  • `_build_from_datasource_file` constructs `extension = "." + key.split(".")[-1]` (or `".bin"`), then incorrectly called `standardize_file_type(extension="." + extension, ...)`, producing `..csv`/`..bin`.
  • The mitigating `lstrip(".")` inside `standardize_file_type` masked the symptom, but the call shape was wrong and inconsistent with every other builder in the same file (`_build_from_local_file`, `_build_from_remote_url`, `_build_from_tool_file` all pass `extension` once-prefixed).
  • Pass the already-prefixed `extension` directly so the helper sees `".csv"` instead of `"..csv"`.

Testing

```
uv run python -m pytest tests/unit_tests/factories/test_file_factory.py::TestBuildFromDatasourceFile -v
```

```
tests/unit_tests/factories/test_file_factory.py::TestBuildFromDatasourceFile::test_extension_passed_without_doubled_dot PASSED [ 50%]
tests/unit_tests/factories/test_file_factory.py::TestBuildFromDatasourceFile::test_extension_falls_back_to_bin_when_key_has_no_dot PASSED [100%]
============================== 2 passed in 0.13s ===============================
```

The first test fails on `main` (asserts the captured argument is `".csv"` but receives `"..csv"`), confirming the regression guard.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 5, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-05-06 00:57:38.703000442 +0000
+++ /tmp/pyrefly_pr.txt	2026-05-06 00:57:29.342899909 +0000
@@ -5533,9 +5533,9 @@
 ERROR Object of class `NoneType` has no attribute `storage_key` [missing-attribute]
    --> tests/unit_tests/factories/test_build_from_mapping.py:164:12
 ERROR `in` is not supported between `Literal['file']` and `None` [not-iterable]
-   --> tests/unit_tests/factories/test_file_factory.py:282:16
+   --> tests/unit_tests/factories/test_file_factory.py:285:16
 ERROR `in` is not supported between `Literal['.txt']` and `None` [not-iterable]
-   --> tests/unit_tests/factories/test_file_factory.py:283:16
+   --> tests/unit_tests/factories/test_file_factory.py:286:16
 ERROR Argument `datetime` is not assignable to parameter `created_at` with type `Decimal | bool | bytes | float | int | str | None` in function `fields.file_fields.FileWithSignedUrl.__init__` [bad-argument-type]
   --> tests/unit_tests/fields/test_file_fields.py:55:20
 ERROR `dict[str, str | None]` is not assignable to TypedDict key `site` with type `list[dict[str, Any]] | list[dict[str, str]] | str` [bad-typed-dict-key]

@fatelei
Copy link
Copy Markdown
Contributor

fatelei commented May 6, 2026

rebase the main code, test has been fixed

…_type for datasource files

`_build_from_datasource_file` first builds `extension = "." + datasource_file.key.split(".")[-1]`
(or `".bin"`), then incorrectly called `standardize_file_type(extension="." + extension, ...)`,
producing `..csv`/`..bin`. The mitigating `lstrip(".")` inside `standardize_file_type` masked
the symptom, but the call shape was wrong and inconsistent with every other builder in this
file (`_build_from_local_file`, `_build_from_remote_url`, `_build_from_tool_file`), all of
which pass `extension` exactly once-prefixed.

Pass the already-prefixed `extension` directly so the helper sees ".csv" instead of "..csv".

Adds two regression tests:
- standardize_file_type receives ".csv" (single-dot) for a keyed datasource file.
- Falls back to ".bin" when the upload key has no dot.
@Beandon13 Beandon13 force-pushed the fix/dify-datasource-double-dot branch from 4bff5ec to 1772552 Compare May 6, 2026 02:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-05-06 02:55:42.377025645 +0000
+++ /tmp/pyrefly_pr.txt	2026-05-06 02:55:30.632000940 +0000
@@ -5535,9 +5535,9 @@
 ERROR Object of class `NoneType` has no attribute `storage_key` [missing-attribute]
    --> tests/unit_tests/factories/test_build_from_mapping.py:164:12
 ERROR `in` is not supported between `Literal['file']` and `None` [not-iterable]
-   --> tests/unit_tests/factories/test_file_factory.py:282:16
+   --> tests/unit_tests/factories/test_file_factory.py:285:16
 ERROR `in` is not supported between `Literal['.txt']` and `None` [not-iterable]
-   --> tests/unit_tests/factories/test_file_factory.py:283:16
+   --> tests/unit_tests/factories/test_file_factory.py:286:16
 ERROR Argument `datetime` is not assignable to parameter `created_at` with type `Decimal | bool | bytes | float | int | str | None` in function `fields.file_fields.FileWithSignedUrl.__init__` [bad-argument-type]
   --> tests/unit_tests/fields/test_file_fields.py:55:20
 ERROR `dict[str, str | None]` is not assignable to TypedDict key `site` with type `list[dict[str, Any]] | list[dict[str, str]] | str` [bad-typed-dict-key]

@fatelei fatelei added this pull request to the merge queue May 6, 2026
Merged via the queue into langgenius:main with commit 70eb98d May 6, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants