Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Error when calling DataFrame.to_numpy on DataFrame.apply() result #1154

Closed
agardelein opened this issue Mar 22, 2020 · 1 comment · Fixed by #1845
Closed

Internal Error when calling DataFrame.to_numpy on DataFrame.apply() result #1154

agardelein opened this issue Mar 22, 2020 · 1 comment · Fixed by #1845
Assignees
Labels
bug 🦗 Something isn't working
Milestone

Comments

@agardelein
Copy link
Contributor

agardelein commented Mar 22, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian bullseye/sid
  • Modin version (modin.__version__): 0.7.2
  • Python version: 3.7.5

Describe the problem

When applying a function on a DataFrame, calling to_numpy() on the result lead to the message reported hereafter. Using _to_pandas() first discards the error.

In apply function func, the result is gathered from the actual DataFrame (d0) and another one (d1) provided in argument. The label of data from d1 is inferred from the current one of d0.

Note 1 : a first cal to func() is performed with an empty DataFrame hence is discarded. I can't find the reason of this first call.
Note 2 : on call to apply(), csv3 is passed using _to_pandas() to prevent another error. I can make another bug report if needed.

Source code / logs

The datafile test.dat

p0,p1,data0,data1,data2
A,0,10,11,12
B,1,20,21,22
C,2,30,31,32
D,3,40,41,42
E,4,50,51,52

The datafile test2.dat

p0,p1,data0,data1,data2
A,10,110,111,112
B,11,120,121,122
C,12,130,131,132
D,13,140,141,142
E,14,150,151,152

The source code:

csv2=modin.pandas.read_csv('test.dat',header=0,index_col=[0,1])
csv3=modin.pandas.read_csv('test2.dat',header=0,index_col=[0,1])

def func(d0, d1=None):
    if d0.empty:
        # Needed to skip first call with no data
        print('empty !')
        return None
    l = list(d0.name)
    l[1] = l[1] + 10
    return pd.Series([d0.values, (d1.loc[tuple(l)]).values],
                     index=['a', 'b'], name='pouet')

r = csv2.apply(func, axis=1, result_type='expand', d1=csv3._to_pandas())
rn = r.to_numpy()
# rn = r._to_pandas().to_numpy()

The output:

FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
UserWarning: User-defined function verification is still under development in Modin. The function provided is not verified.
empty !
Traceback (most recent call last):
  File "./test2.py", line 24, in <module>
    rn = r.to_numpy()
  File "~/.local/lib/python3.7/site-packages/modin/pandas/base.py", line 2962, in to_numpy
    arr = self._query_compiler.to_numpy()
  File "~/.local/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 169, in to_numpy
    len(arr) != len(self.index) or len(arr[0]) != len(self.columns)
  File "~/.local/lib/python3.7/site-packages/modin/error_message.py", line 40, in catch_bugs_and_request_email
    " caused this error.\n{}".format(extra_log)
Exception: Internal Error. Please email bug_reports@modin.org with the traceback and command that caused this error.
@agardelein agardelein added the bug 🦗 Something isn't working label Mar 22, 2020
@devin-petersohn devin-petersohn added this to the 0.7.4 milestone Mar 23, 2020
@devin-petersohn
Copy link
Collaborator

Thanks for the report @agardelein! Complex apply functions that change the size/shape of the dataframe can be a real challenge internally. The error that gets thrown is when the internal metadata and external metadata do not match.

I am going to put this on 0.7.4 because of the effort involved. It will take some time to do the code analysis component of the UDF to see what the user needs.

We also do not have a capability yet to pass an entire other dataframe to a UDF in apply as you said. Feel free to open a feature request for that as we do want to support it in the future.

@dchigarev dchigarev self-assigned this Jul 28, 2020
dchigarev added a commit to dchigarev/modin that referenced this issue Jul 29, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
devin-petersohn pushed a commit that referenced this issue Jul 29, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants