Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make generate_series compatible with PostgreSQL #13316

Open
Dandandan opened this issue Nov 8, 2024 · 3 comments
Open

Make generate_series compatible with PostgreSQL #13316

Dandandan opened this issue Nov 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Nov 8, 2024

Describe the bug

generate_series generates a list:

> SELECT generate_series(1,5);
+------------------------------------+
| generate_series(Int64(1),Int64(5)) |
+------------------------------------+
| [1, 2, 3, 4, 5]                    |
+------------------------------------+
1 row(s) fetched. 
Elapsed 0.003 seconds.

However PostgreSQL generates rows:

SELECT generate_series(1,5)

generate_series
-----------------
               1
               2
               3
               4
               5
(5 rows)

To Reproduce

No response

Expected behavior

I expect it to be compatible with the PostgreSQL function, besides it should also be more efficient to generate columnar data than having to unnest it.

Additional context

No response

@Dandandan Dandandan added the bug Something isn't working label Nov 8, 2024
@jonathanc-n
Copy link
Contributor

take

@2010YOUY01
Copy link
Contributor

I think we can keep this behavior, and add another UDTF with the same name (which follows DuckDB behavior):

D select generate_series(1,3);
┌───────────────────────┐
│ generate_series(1, 3) │
│        int64[]        │
├───────────────────────┤
│ [1, 2, 3]             │
└───────────────────────┘
D select * from generate_series(1,3);
┌─────────────────┐
│ generate_series │
│      int64      │
├─────────────────┤
│               1 │
│               2 │
│               3 │
└─────────────────┘

I think it's also easy to use: we can operate both array and table column with generate_series()
Additionally, it would be great to let the table function only use constant memory. For example, if select * from genereate_series(1, 1000000000) it should generate data batch by batch, instead of materializing everything in TableScan node, then start output.
This way this function can be very useful in many cases like micro benchmarks and tests.

@jonathanc-n
Copy link
Contributor

@2010YOUY01 Yeah I agree with this, I was working on this a bit and the current generate_series logic + output seems to follow through well with the other UDFs.

@jonathanc-n jonathanc-n removed their assignment Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants