Skip to content

Exception: Internal error, Exception: Schema error #2938

Closed
@ghost

Description

Describe the bug

Case 1... Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code.
Case 2... Exception: Schema error(same cause as Case 1 ?)

To Reproduce

import datafusion
ctx = datafusion.SessionContext()
datafusion.__version__

'0.6.0'

!echo "a,b\n1,4\n2,5\n3,6" > example.csv
ctx.register_csv('example', 'example.csv')
ctx.sql('SELECT * from example').show()

+---+---+
| a | b |
+---+---+
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
+---+---+

Case 1... Exception: Internal error

sql = '''
SELECT AVG(b) AS median_value  
  FROM ( 
    SELECT 
      b, 
      COUNT(b) OVER () AS row_count, 
      ROW_NUMBER() OVER (ORDER BY b) AS row_number 
    FROM example 
  )
  ORDER BY median_value
  '''
df = ctx.sql(sql)
df.show()

Exception Traceback (most recent call last)
Input In [65], in
  1 sql = '''
  2 SELECT AVG(b) AS median_value
  3 FROM (
 (...)
  10 ORDER BY median_value
  11 '''
  12 df = ctx.sql(sql)
---> 13 df.show()

Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Case 2... Exception: Schema error

sql = '''
SELECT AVG(b) AS median_value  
  FROM ( 
    SELECT 
      b, 
      COUNT(b) OVER () AS row_count, 
      ROW_NUMBER() OVER (ORDER BY b) AS row_number 
    FROM example 
  ) 
  WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)  
  '''
df = ctx.sql(sql)
df.show()

Exception Traceback (most recent call last)
Input In [57], in
  1 sql = '''
  2 SELECT AVG(b) AS median_value
  3 FROM (
  (...)
  10 WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
  11 '''
  12 df = ctx.sql(sql)
---> 13 df.show()

Exception: Schema error: No field named 'row_number'. Valid fields are 'example.b'.

Additional context

SQL is meant to be reproduced, and the processing content has no meaning.
I think SQL is correct. Is there a way around it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions