Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python)!: Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred #16976

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Jun 16, 2024

Closes #16818

Changes

  • No longer inspect data types to infer data orientation. Data orientation is inferred based on the data and schema dimensions. This is a breaking change. See example below.
  • If row orientation is inferred, issue a DataOrientationWarning. This attends the user that they are relying on 'implicit' behavior and that explicitly setting the orientation may be beneficial.

This is a bit of a hit in ergonomics in favor of explicitness.

We can infer the data orientation in some cases but not others (e.g. when number of rows/columns is equal). This ended up with surprising behavior in certain cases (see linked issue).

Our solution is to issue a warning whenever row-orientation is inferred. The user must explicitly pass orient="row" to silence the warning.

Example

Before:

>>> data = [[1, "a"], [2, "b"]]
>>> pl.DataFrame(data)
shape: (2, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ ---      ┆ ---      │
│ i64      ┆ str      │
╞══════════╪══════════╡
│ 1        ┆ a        │
│ 2        ┆ b        │
└──────────┴──────────┘

After:

>>> pl.DataFrame(data)
Traceback (most recent call last):
...
TypeError: unexpected value while building Series of type Int64; found value of type String: "a"

Hint: Try setting `strict=False` to allow passing data with mixed types.

Use instead:

>>> pl.DataFrame(data, orient="row")
shape: (2, 2)
┌──────────┬──────────┐
│ column_0 ┆ column_1 │
│ ---      ┆ ---      │
│ i64      ┆ str      │
╞══════════╪══════════╡
│ 1        ┆ a        │
│ 2        ┆ b        │
└──────────┴──────────┘

@github-actions github-actions bot added deprecation Add a deprecation warning to outdated functionality python Related to Python Polars labels Jun 16, 2024
Copy link

codecov bot commented Jun 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.03%. Comparing base (308df5d) to head (49e1b6c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16976      +/-   ##
==========================================
+ Coverage   81.02%   81.03%   +0.01%     
==========================================
  Files        1446     1446              
  Lines      190424   190421       -3     
  Branches     2709     2708       -1     
==========================================
+ Hits       154295   154314      +19     
+ Misses      35632    35610      -22     
  Partials      497      497              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stinodego stinodego changed the title depr(python): Deprecate automatic inference of data orientation during DataFrame construction feat(python)!: Warn when inferring row orientation during DataFrame construction Jun 16, 2024
@github-actions github-actions bot added breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature labels Jun 16, 2024
@stinodego stinodego marked this pull request as ready for review June 16, 2024 23:02
@stinodego stinodego removed the deprecation Add a deprecation warning to outdated functionality label Jun 16, 2024
@ritchie46 ritchie46 merged commit 5d071d6 into main Jun 17, 2024
16 checks passed
@ritchie46 ritchie46 deleted the row-orient branch June 17, 2024 06:48
@stinodego stinodego changed the title feat(python)!: Warn when inferring row orientation during DataFrame construction feat(python)!: Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred Jun 17, 2024
@stinodego stinodego self-assigned this Jun 17, 2024
@stinodego stinodego added accepted Ready for implementation and removed accepted Ready for implementation labels Jun 17, 2024
Wouittone pushed a commit to Wouittone/polars that referenced this pull request Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

pl.DataFrame loads in 2D lists in unexpected way
2 participants