Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Internal error, Exception: Schema error #2938

Closed
ghost opened this issue Jul 18, 2022 · 2 comments
Closed

Exception: Internal error, Exception: Schema error #2938

ghost opened this issue Jul 18, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Jul 18, 2022

Describe the bug

Case 1... Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code.
Case 2... Exception: Schema error(same cause as Case 1 ?)

To Reproduce

import datafusion
ctx = datafusion.SessionContext()
datafusion.__version__

'0.6.0'

!echo "a,b\n1,4\n2,5\n3,6" > example.csv
ctx.register_csv('example', 'example.csv')
ctx.sql('SELECT * from example').show()

+---+---+
| a | b |
+---+---+
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
+---+---+

Case 1... Exception: Internal error

sql = '''
SELECT AVG(b) AS median_value  
  FROM ( 
    SELECT 
      b, 
      COUNT(b) OVER () AS row_count, 
      ROW_NUMBER() OVER (ORDER BY b) AS row_number 
    FROM example 
  )
  ORDER BY median_value
  '''
df = ctx.sql(sql)
df.show()

Exception Traceback (most recent call last)
Input In [65], in
  1 sql = '''
  2 SELECT AVG(b) AS median_value
  3 FROM (
 (...)
  10 ORDER BY median_value
  11 '''
  12 df = ctx.sql(sql)
---> 13 df.show()

Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Case 2... Exception: Schema error

sql = '''
SELECT AVG(b) AS median_value  
  FROM ( 
    SELECT 
      b, 
      COUNT(b) OVER () AS row_count, 
      ROW_NUMBER() OVER (ORDER BY b) AS row_number 
    FROM example 
  ) 
  WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)  
  '''
df = ctx.sql(sql)
df.show()

Exception Traceback (most recent call last)
Input In [57], in
  1 sql = '''
  2 SELECT AVG(b) AS median_value
  3 FROM (
  (...)
  10 WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
  11 '''
  12 df = ctx.sql(sql)
---> 13 df.show()

Exception: Schema error: No field named 'row_number'. Valid fields are 'example.b'.

Additional context

SQL is meant to be reproduced, and the processing content has no meaning.
I think SQL is correct. Is there a way around it?

@ghost ghost added the bug Something isn't working label Jul 18, 2022
@kmitchener
Copy link
Contributor

You're using the datafusion-python project which is using version 8 of the DataFusion library. In latest DataFusion (master as of now), the first query works fine now thanks to recent fixes, but the second query returns with

thread 'tokio-runtime-worker' panicked at 'not implemented: InList does not yet support nested columns.', 

There's a recent PR to update datafusion-python to version 10 of the DataFusion library, which should fix the issue with the first query. The second query appears to trigger a new bug/unsupported feature.

@ghost
Copy link
Author

ghost commented Jul 19, 2022

Thank you for your comment.
Case 2 wants to get the median value, so consider other methods.

@ghost ghost closed this as completed Oct 22, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant