Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numba
a = numba.typed.List()
a.append(1)
a.append(2)
pd.DataFrame(a)
raises with
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-17-0844eae3ab56> in <module>
6 a.append(2)
7
----> 8 pd.DataFrame(a)
~/sandbox/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
458 mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
459 else:
--> 460 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
461 else:
462 mgr = init_dict({}, index, columns, dtype=dtype)
~/sandbox/pandas/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
158 # by definition an array here
159 # the dtypes will be coerced to a single dtype
--> 160 values = prep_ndarray(values, copy=copy)
161
162 if dtype is not None:
~/sandbox/pandas/pandas/core/internals/construction.py in prep_ndarray(values, copy)
279 values = values.copy()
280
--> 281 if values.ndim == 1:
282 values = values.reshape((values.shape[0], 1))
283 elif values.ndim != 2:
AttributeError: 'List' object has no attribute 'ndim'
Problem description
Numba since version 0.45 provides a new typed list class that allows fast manipulation of lists in compiled code.
Construction of a pandas DataFrame from such a typed list is not straightforward, however.
First one cannot put such an object directly into the Dataframe constructor, but one has to convert it to a list or numpy array first.
Second, the conversion is slow. In the above example it takes one second on my machine if I convert the typed list of 100000 float32 values into a list and then put it into pandas. If I convert the typed list into a numpy array it takes almost 2 seconds.
Conversely, constructing a DataFrame from a conventional list or numpy array takes only about 1/100 seconds.
I wonder if it is possible to write a more efficient Dataframe constructor that uses numba typed lists as input.
Expected Output
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]