Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post with DataFusion Jun - Sep 2023 #6780

Closed
Tracked by #8655
alamb opened this issue Jun 27, 2023 · 12 comments · Fixed by apache/arrow-site#457
Closed
Tracked by #8655

Blog post with DataFusion Jun - Sep 2023 #6780

alamb opened this issue Jun 27, 2023 · 12 comments · Fixed by apache/arrow-site#457
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 27, 2023

Is your feature request related to a problem or challenge?

We have had good luck writing up quarterly updates for DataFusion, most recently:
https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/

(see #5812)

Describe the solution you'd like

It would be great to write another about what has happened in the last few months of DataFusion

Things I expect will be good to highlight (🤞 ):

  • Improved Struct/array support (@izveigor ❤️ )
  • better group by performance with many distinct groups
  • better insert performance

Others?

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

Ideas of Major items to include in this post

  1. User defined window functions: Blog post about user defined window functions #6781
  2. faster aggregatge performance -- Improve the performance of Aggregator, grouping, aggregation #4973
  3. Support for ARRAY / Lists -- General ticket for Array/List data type #6863 etc (thanks @izveigor and @jayzhan211 )

@Dandandan
Copy link
Contributor

Improved join performance would maybe be another thing to highlight. Maybe we show a benchmark with improvements (TCP-H, ClickBench, ...) from version 25 -> 28.

@alamb
Copy link
Contributor Author

alamb commented Aug 17, 2023

There has been major work on INSERT and COPY as well, thanks to @devinjdangelo : #6569

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

Also #7400 spilling group by from @kazuyukitanimura

@alamb
Copy link
Contributor Author

alamb commented Sep 18, 2023

Another topic: the new library user guide: https://arrow.apache.org/datafusion/library-user-guide/index.html

@alamb
Copy link
Contributor Author

alamb commented Oct 14, 2023

FYI this is very much on my list, but I need to focus on the SIGMOD paper for a while. If someone else has the time and inclination to start a PR I would be most appreciative

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2023

Realistically I am very tied up with #6782 and so won't have time to work on a blog post until after that is submitted (end of Nov). If someone else has time to work on this it would be very much apprecaited

@alamb
Copy link
Contributor Author

alamb commented Jan 1, 2024

This is going to have to be more like a 2023 retrospective 🤔

@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2024

I am starting to draft this now

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2024

Here is a PR with a draft (still needs more work): apache/arrow-site#457

alamb added a commit to apache/arrow-site that referenced this issue Jan 19, 2024
Closes apache/datafusion#6780

This blog post describes DataFusion over the last 6 months, DataFusion
26 to 34.

If anyone has time to pitch in and look up links or help with the
language that would be most apprecaited

---------

Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Jan 19, 2024

The blog post is now published! https://arrow.apache.org/blog/2024/01/19/datafusion-34.0.0/

@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2024

Let's capture other items to highlight here #9602

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants