You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add an optional default_column_type parameter to the CSV reading API (C++ and Python) to provide a fallback type when per-column types aren’t specified, improving schema consistency and complementing the existing column_types logic.
What changes are included in this PR?
c++: new convert option "default_column_type" to augment logic around column_types parameter
python: corresponding changes to make cpp change consumable from python
python: extended test_convert_options test - include. default_column_type
python: added new test "test_default_column_type" which tests how the field impacts schema; also test implicitly verifies leading zero preservation
relevant documentation update for python component;
Are these changes tested?
Yes. Existing and new tests are passing.
C++:
> [==========] Running 3 tests from 1 test suite.
> [----------] Global test environment set-up.
> [----------] 3 tests from ReaderTests
> [ RUN ] ReaderTests.DefaultColumnTypePartialDefault
> [ OK ] ReaderTests.DefaultColumnTypePartialDefault (3 ms)
> [ RUN ] ReaderTests.DefaultColumnTypeAllStringsWithHeader
> [ OK ] ReaderTests.DefaultColumnTypeAllStringsWithHeader (0 ms)
> [ RUN ] ReaderTests.DefaultColumnTypeAllStringsNoHeader
> [ OK ] ReaderTests.DefaultColumnTypeAllStringsNoHeader (0 ms)
> [----------] 3 tests from ReaderTests (4 ms total)
>
> [----------] Global test environment tear-down
> [==========] 3 tests from 1 test suite ran. (4 ms total)
> [ PASSED ] 3 tests.
All:
> [==========] 264 tests from 46 test suites ran. (452 ms total)
> [ PASSED ] 264 tests.
pyarrow:
New tests are passing.
Are there any user-facing changes?
I believe this change is backward compatible. Parameter is optional and its default value doesn't change the existing behavior; All the existing rests are passing.
Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/18062577036
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Add an optional default_column_type parameter to the CSV reading API (C++ and Python) to provide a fallback type when per-column types aren’t specified, improving schema consistency and complementing the existing column_types logic.
What changes are included in this PR?
Are these changes tested?
Yes. Existing and new tests are passing.
C++:
pyarrow:
New tests are passing.
Are there any user-facing changes?
I believe this change is backward compatible. Parameter is optional and its default value doesn't change the existing behavior; All the existing rests are passing.
Maybe relevant: #22232
Relates to #47502