Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cudf.pandas failure on test_convert_matrix_order_cuml_array #5882

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions python/cuml/internals/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -1104,6 +1104,11 @@ def from_input(
X = X.to_cupy(copy=False)
elif isinstance(X, (PandasDataFrame, PandasSeries)):
X = X.to_numpy(copy=False)
# by default pandas converts to numpy 'C' major, which
# does not keep the original order
if order == "K":
X = X.reshape(X.shape, order="F")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that you could change the order via reshape and was wondering whether or not this leads to a copy of the data. So I played around a bit to see it happen with my own eyes.

df = pd.DataFrame({'a': [1,2,3], 'b':[11,22,33]})
X = df.to_numpy()
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true
X = X.reshape(X.shape, order="F")
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true :-/

Which left my puzzled. Is my assumption of what should be happening wrong? There is np.asfortranarray() which does change C to F, but at the cost of making a copy of the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got myself confused. For me the result of to_numpy() is already F contiguous?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the result can be inconsistent depending on the wrapping that cudf is doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it should have no cost if it's already in the correct order

order = "F"
elif hasattr(X, "__dataframe__"):
# temporarily use this codepath to avoid errors, substitute
# usage of dataframe interchange protocol once ready.
Expand Down
7 changes: 4 additions & 3 deletions python/cuml/tests/test_input_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ def test_convert_matrix_order_cuml_array(
input_type, 10, 10, dtype, order=from_order
)

if input_type in ["cudf", "pandas"]:
from_order = "F"

# conv_data = np.array(real_data, order=to_order, copy=True)
if from_order == to_order or to_order == "K":
conv_data, *_ = input_to_cuml_array(
Expand All @@ -148,10 +151,8 @@ def test_convert_matrix_order_cuml_array(
)

if to_order == "K":
if input_type in ["cudf"]:
if input_type in ["cudf", "pandas"]:
assert conv_data.order == "F"
elif input_type in ["pandas"]:
assert conv_data.order == "C"
else:
assert conv_data.order == from_order
else:
Expand Down
Loading