-
Notifications
You must be signed in to change notification settings - Fork 16.7k
fix(csv): respect CSV_EXPORT config for decimal separator and delimiter #38170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2788,3 +2788,102 @@ def test_apply_client_processing_csv_format_default_na_behavior(): | |||||||||||||||||||||
| assert ( | ||||||||||||||||||||||
| "Alice," in lines[2] | ||||||||||||||||||||||
| ) # Second data row should have empty last_name (NA converted to null) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| @with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}}) | ||||||||||||||||||||||
| def test_apply_client_processing_csv_format_custom_separator(): | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
| Test that apply_client_processing respects CSV_EXPORT config | ||||||||||||||||||||||
| for custom separator and decimal character. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| This is a regression test for GitHub issue #32371. | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
| # CSV data with numeric values | ||||||||||||||||||||||
| csv_data = "name,value\nAlice,1.5\nBob,2.75" | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| result = { | ||||||||||||||||||||||
| "queries": [ | ||||||||||||||||||||||
| { | ||||||||||||||||||||||
| "result_format": ChartDataResultFormat.CSV, | ||||||||||||||||||||||
| "data": csv_data, | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
| ] | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| form_data = { | ||||||||||||||||||||||
| "datasource": "1__table", | ||||||||||||||||||||||
| "viz_type": "table", | ||||||||||||||||||||||
| "slice_id": 1, | ||||||||||||||||||||||
| "url_params": {}, | ||||||||||||||||||||||
| "metrics": [], | ||||||||||||||||||||||
| "groupby": [], | ||||||||||||||||||||||
| "columns": ["name", "value"], | ||||||||||||||||||||||
| "extra_form_data": {}, | ||||||||||||||||||||||
| "force": False, | ||||||||||||||||||||||
| "result_format": "csv", | ||||||||||||||||||||||
| "result_type": "results", | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| processed_result = apply_client_processing(result, form_data) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| output_data = processed_result["queries"][0]["data"] | ||||||||||||||||||||||
| lines = output_data.strip().split("\n") | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # With sep=";", columns should be separated by semicolon | ||||||||||||||||||||||
| assert "name;value" in lines[0] | ||||||||||||||||||||||
| # With decimal=",", decimal values should use comma as separator | ||||||||||||||||||||||
| assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1] | ||||||||||||||||||||||
| assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2] | ||||||||||||||||||||||
|
Comment on lines
+2833
to
+2836
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The regression test for custom CSV separator and decimal allows both comma and dot decimals, so it will still pass even if the CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric formatting. [logic error] Severity Level: Major
|
||||||||||||||||||||||
| assert "name;value" in lines[0] | |
| # With decimal=",", decimal values should use comma as separator | |
| assert "Alice;1,5" in lines[1] or "Alice;1.5" in lines[1] | |
| assert "Bob;2,75" in lines[2] or "Bob;2.75" in lines[2] | |
| assert lines[0] == "name;value" | |
| # With decimal=",", decimal values should use comma as separator | |
| assert "Alice;1,5" in lines[1] | |
| assert "Bob;2,75" in lines[2] |
Steps of Reproduction ✅
1. Open `test_apply_client_processing_csv_format_custom_separator()` in
`tests/unit_tests/charts/test_client_processing.py:2793-2827` (added in this PR) which
uses `@with_config({"CSV_EXPORT": {"sep": ";", "decimal": ","}})` and calls
`apply_client_processing()`.
2. Note that `apply_client_processing()` in `superset/charts/client_processing.py:312-398`
formats CSV exports via `processed_df.to_csv(...)` using `csv_export_config =
current_app.config.get("CSV_EXPORT", {})` at lines 391-395.
3. Introduce a regression by editing `processed_df.to_csv(buf, index=show_default_index,
**csv_export_config)` at `superset/charts/client_processing.py:393-394` so that it no
longer passes the `decimal` option (e.g., only passes `sep`), causing numeric values in
the exported table CSV to retain a `.` decimal separator.
4. Run `pytest
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_format_custom_separator`;
observe the test still passes because the assertions at
`tests/unit_tests/charts/test_client_processing.py:2832-2836` accept both `"Alice;1,5"`
and `"Alice;1.5"` (and likewise for Bob), so the regression in honoring
`CSV_EXPORT["decimal"]` is not detected.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2833:2836
**Comment:**
*Logic Error: The regression test for custom CSV separator and decimal allows both comma and dot decimals, so it will still pass even if the CSV_EXPORT decimal setting is ignored, failing to detect regressions in numeric formatting.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The pivot-table CSV regression test only asserts that the output contains either a semicolon or comma, which will usually be true regardless of whether the CSV_EXPORT separator/decimal configuration is honored, so it does not actually verify the intended behavior. [logic error]
Severity Level: Major ⚠️
- ⚠️ Pivot-table CSV regression test barely checks separator behavior.
- ⚠️ Pivot CSV_EXPORT settings regressions could slip into production.| # The output should use semicolon as separator and comma as decimal | |
| # After pivoting, the format should reflect CSV_EXPORT settings | |
| assert ";" in output_data or "," in output_data | |
| lines = output_data.strip().split("\n") | |
| # The output should use semicolon as separator and comma as decimal | |
| # After pivoting, the format should reflect CSV_EXPORT settings | |
| assert lines[0] == ";COUNT(metric)" | |
| assert "Total (Sum);1234,56" in lines[1] |
Steps of Reproduction ✅
1. Open `test_apply_client_processing_csv_pivot_table_custom_separator()` in
`tests/unit_tests/charts/test_client_processing.py:2839-2889`, which sets `CSV_EXPORT =
{"sep": ";", "decimal": ","}` and calls `apply_client_processing()` with
`viz_type="pivot_table_v2"` and `result_format="csv"`.
2. Inspect `apply_client_processing()` in `superset/charts/client_processing.py:312-398`,
where pivoted CSV results are produced by `pivot_table_v2()` and written via
`processed_df.to_csv(buf, index=show_default_index, **csv_export_config)` at lines 391-395
using the `CSV_EXPORT` configuration.
3. Introduce a regression by modifying the `to_csv` call in `apply_client_processing()`
(lines 391-395) so that it ignores `CSV_EXPORT` (e.g., remove `**csv_export_config` or
hard-code default `sep=","`, `decimal="."`), causing the pivoted CSV to use default
separators and decimals.
4. Run `pytest
tests/unit_tests/charts/test_client_processing.py::test_apply_client_processing_csv_pivot_table_custom_separator`;
observe that the test still passes, because the only assertion at
`tests/unit_tests/charts/test_client_processing.py:2883-2889` merely checks that the
output string contains either `";"` or `","`, which is true for any CSV output regardless
of whether `CSV_EXPORT` was honored.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/charts/test_client_processing.py
**Line:** 2886:2889
**Comment:**
*Logic Error: The pivot-table CSV regression test only asserts that the output contains either a semicolon or comma, which will usually be true regardless of whether the CSV_EXPORT separator/decimal configuration is honored, so it does not actually verify the intended behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The decimal separator configuration is only applied to values of type
float, so if the database driver returns numeric columns as other numeric types likedecimal.Decimal(which is common for NUMERIC/DECIMAL columns), those values will bypass_format_row_valuesand still use a dot as decimal separator, causing streaming CSV exports to ignore the configured decimal separator for many numeric columns; broaden the type check to cover generic numeric types so all numeric values respect thedecimalsetting. [logic error]Severity Level: Major⚠️
Steps of Reproduction ✅
Prompt for AI Agent 🤖