
Description
Describe the bug
Case 1... Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code.
Case 2... Exception: Schema error(same cause as Case 1 ?)
To Reproduce
import datafusion
ctx = datafusion.SessionContext()
datafusion.__version__
'0.6.0'
!echo "a,b\n1,4\n2,5\n3,6" > example.csv
ctx.register_csv('example', 'example.csv')
ctx.sql('SELECT * from example').show()
+---+---+
| a | b |
+---+---+
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
+---+---+
Case 1... Exception: Internal error
sql = '''
SELECT AVG(b) AS median_value
FROM (
SELECT
b,
COUNT(b) OVER () AS row_count,
ROW_NUMBER() OVER (ORDER BY b) AS row_number
FROM example
)
ORDER BY median_value
'''
df = ctx.sql(sql)
df.show()
Exception Traceback (most recent call last)
Input In [65], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 ORDER BY median_value
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Case 2... Exception: Schema error
sql = '''
SELECT AVG(b) AS median_value
FROM (
SELECT
b,
COUNT(b) OVER () AS row_count,
ROW_NUMBER() OVER (ORDER BY b) AS row_number
FROM example
)
WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
'''
df = ctx.sql(sql)
df.show()
Exception Traceback (most recent call last)
Input In [57], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Schema error: No field named 'row_number'. Valid fields are 'example.b'.
Additional context
SQL is meant to be reproduced, and the processing content has no meaning.
I think SQL is correct. Is there a way around it?