You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Case 1... Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code.
Case 2... Exception: Schema error(same cause as Case 1 ?)
sql='''SELECT AVG(b) AS median_value FROM ( SELECT b, COUNT(b) OVER () AS row_count, ROW_NUMBER() OVER (ORDER BY b) AS row_number FROM example ) ORDER BY median_value '''df=ctx.sql(sql)
df.show()
Exception Traceback (most recent call last)
Input In [65], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 ORDER BY median_value
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Case 2... Exception: Schema error
sql='''SELECT AVG(b) AS median_value FROM ( SELECT b, COUNT(b) OVER () AS row_count, ROW_NUMBER() OVER (ORDER BY b) AS row_number FROM example ) WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2) '''df=ctx.sql(sql)
df.show()
Exception Traceback (most recent call last)
Input In [57], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Schema error: No field named 'row_number'. Valid fields are 'example.b'.
Additional context
SQL is meant to be reproduced, and the processing content has no meaning.
I think SQL is correct. Is there a way around it?
The text was updated successfully, but these errors were encountered:
You're using the datafusion-python project which is using version 8 of the DataFusion library. In latest DataFusion (master as of now), the first query works fine now thanks to recent fixes, but the second query returns with
thread 'tokio-runtime-worker' panicked at 'not implemented: InList does not yet support nested columns.',
There's a recent PR to update datafusion-python to version 10 of the DataFusion library, which should fix the issue with the first query. The second query appears to trigger a new bug/unsupported feature.
Describe the bug
Case 1... Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code.
Case 2... Exception: Schema error(same cause as Case 1 ?)
To Reproduce
'0.6.0'
+---+---+
| a | b |
+---+---+
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
+---+---+
Case 1... Exception: Internal error
Exception Traceback (most recent call last)
Input In [65], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 ORDER BY median_value
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Internal error: Impossibly got empty window expression. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Case 2... Exception: Schema error
Exception Traceback (most recent call last)
Input In [57], in
1 sql = '''
2 SELECT AVG(b) AS median_value
3 FROM (
(...)
10 WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)
11 '''
12 df = ctx.sql(sql)
---> 13 df.show()
Exception: Schema error: No field named 'row_number'. Valid fields are 'example.b'.
Additional context
SQL is meant to be reproduced, and the processing content has no meaning.
I think SQL is correct. Is there a way around it?
The text was updated successfully, but these errors were encountered: