Skip to content

Construct 1d array from listlike #18769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
REF: implement and use construct_1d_array_from_listlike
  • Loading branch information
toobaz committed Dec 19, 2017
commit d4d02a2b4db6705e2816f47e54f53d55a94d2f3f
18 changes: 6 additions & 12 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from pandas.core.dtypes.missing import isna, isnull, notnull # noqa
from pandas.api import types
from pandas.core.dtypes import common
from pandas.core.dtypes.cast import construct_1d_array_from_listlike

# compat
from pandas.errors import ( # noqa
Expand Down Expand Up @@ -381,25 +382,18 @@ def _asarray_tuplesafe(values, dtype=None):
return values.values

if isinstance(values, list) and dtype in [np.object_, object]:
return lib.list_to_object_array(values)
return construct_1d_array_from_listlike(values)

result = np.asarray(values, dtype=dtype)

if issubclass(result.dtype.type, compat.string_types):
result = np.asarray(values, dtype=object)

if result.ndim == 2:
if isinstance(values, list):
return lib.list_to_object_array(values)
else:
# Making a 1D array that safely contains tuples is a bit tricky
# in numpy, leading to the following
try:
result = np.empty(len(values), dtype=object)
result[:] = values
except ValueError:
# we have a list-of-list
result[:] = [tuple(x) for x in values]
# Avoid building an array of arrays:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an asv that hits this case?

Copy link
Member Author

@toobaz toobaz Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even think there is any valid code path that hits this case... which indeed should be suppressed in #18626

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is not valid code, then let's remove it. or make a new issue. inside a PR doesn't help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here with a TODO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

# TODO: verify whether any path hits this except #18819 (invalid)
values = [tuple(x) for x in values]
result = construct_1d_array_from_listlike(values)

return result

Expand Down
25 changes: 25 additions & 0 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1162,3 +1162,28 @@ def construct_1d_arraylike_from_scalar(value, length, dtype):
subarr.fill(value)

return subarr


def construct_1d_array_from_listlike(values, dtype='object'):
"""
Transform any list-like object in a 1-dimensional numpy array.

Parameters
----------
values : any iterable which has a len()
dtype : dtype, default 'object'

Raises
------
TypeError
* If `values` does not have a len()

Returns
-------
1-dimensional numpy array of dtype "dtype"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to
construct_1d_object_array_from_listlike and drop the dtype (always make it object).

you can also just add another function for this as well (e.g. have 1 that accepts the dtype as a required parameter ). I also don't think we are actually using dtype anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array we return is of dtype dtype. So this will raise an error if e.g. dtype=int and the input contains lists, but this is good because it allows cleaner code (and less checks) in those methods where the actual dtype is unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this, so its very confusing. I would just rather return an object dtyped array here (or have 2 methods).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this, so its very confusing

Non sequitur.

have 2 methods

... with almost perfect duplication of code, one of which perfectly generalizes the other, and without any performance gain?! I thought you liked to reduce code :-)

This method builds a 1d array of type dtype with given data. If it is impossible, it raises an error. This is pretty straightforward, and whenever np.array will have an ndmax parameter, all calls to this method will be replaced with a simple call to np.array(..., ndmax=1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I will explain again.

We are not using the dtype argument. So just eliminate it. I don't see utility in having it.

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not object to an
assert is_iterable(values) with a nice error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is: I can't think of any code path which could be hitting it. Scalar input to a Series() is (considered valid and) recasted to a 1-d before calling this. Similarly, an operation such as Series([1,2]) + 3 transforms 3 before hitting this. So I don't know what the error message could actually say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what i am asking

this is a completely internal
routine
it should fail with invalid input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should fail with invalid input

Sure it does, TypeError: object of type 'int' has no len(). Which is pretty clear, considering the docstring, and precisely in light of the fact that this is an internal routine. That said, feel free to suggest an error message which is worth the cost of the additional assert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a raises section to the doc-string

# numpy will try to interpret nested lists as further dimensions, hence
# making a 1D array that contains list-likes is a bit tricky:
result = np.empty(len(values), dtype=dtype)
result[:] = values
return result