Skip to content

Assigning Series of length different from DataFrame index doesn't work #1496

@gshimansky

Description

@gshimansky

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Ubuntu 19.10

  • Modin version (modin.__version__):

0.7.3+52.g880545b

  • Python version:

Python 3.7.5

  • Code we can use to reproduce:

Here are three slightly different cases that produce different behavior

import modin.pandas as pd

df = pd.DataFrame({"id": [], "max_speed": [], "health": []})
se = pd.Series([11, 22, 33])
df[0] = se
print(df)
import modin.pandas as pd

df = pd.DataFrame({"id": [1], "max_speed": [2], "health": [3]})
se = pd.Series([11, 22, 33])
df[0] = se
print(df)
import modin.pandas as pd

df = pd.DataFrame({"id": [4, 40, 400], "max_speed": [111, 222, 333], "health": [33, 22, 11]})
se = pd.Series([11, 22])
df['id'] = se
print(df)

Describe the problem

NOTE: These tests produce an exception that was fixed in #1495. But they still work incorrectly.

  1. Test for first case was even added in Fixed #1490. New column case is checked first in __setitem__ #1495 but marked as xfail because it produces a different exception.
  2. Test for second case doesn't produce an exception but works differently from Pandas. Pandas truncates Series object while Modin fills up new rows with NaNs.
  3. Test for third case doesn't produce an exception too but also works differently from Pandas. Pandas pads Series object and replaces DataFrame column. Modin truncates DataFrame to Series length.

Source code / logs

Metadata

Metadata

Assignees

Labels

bug 🦗Something isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions