Skip to content

Add design process section to the docs #16397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 16, 2025
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 13, 2025

Which issue does this PR close?

Rationale for this change

While discussing a design for cancellation with @pepijnve and @zhuqi-lucas and myself, @ozankabak wrote a great summary of how the DataFusion community handles larger projects:

Look, I see that you are trying to help and we do want to take it. I suspect we might be facing a "culture" challenge here: Typically, DF community attacks large problems by solving them bit by bit and refining a solution iteratively. This is unlike some other projects which front-load the effort by going through a more comprehensive design process. We also do that for some tasks where this iterative approach is not applicable, but it is not very common.

This "bit by bit approach" doesn't always succeed, every now and then it happens that we get stuck or go down the wrong path for a while, and then change tacks. However, we still typically prefer to "advance the front" and make progress in tangible ways as much as we can (if we see a way). This necessarily results in imperfect solutions being the "state of the code" in some cases, and they survive in the codebase for a while, but we are good at driving things to completion in the long run.

I really liked that description and think it captures well the current state of the project, and thus would be valuable to make part of the docs

What changes are included in this PR?

Add a description of the design process to the Datafusion docs site

Are these changes tested?

By CI

Are there any user-facing changes?

New docs

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 13, 2025
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @alamb!

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @alamb

community is good at driving things to completion in the long run. If you see
something that needs improvement or an area that is not yet fully realized,
please consider submitting an issue or PR to improve it. We are always looking
for more contributions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we always have to be 🎣

Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice, thanks @alamb!

@alamb
Copy link
Contributor Author

alamb commented Jun 13, 2025

This is really nice, thanks @alamb!

Thanks -- I was just channeling @ozankabak :)

@pepijnve
Copy link
Contributor

Sorry to go a bit off topic for a sec, but there's some context I would like to add. I worked on API design of a commercial software library with tons of extension points for 10+ years where backwards compatibility of the public API was something we stuck to religiously because of the burden API breakage places on the entire user base. Doing that kind of work for an extended period of time makes you think three times about new API and all the hypothetical uses and abuses; perhaps a bit too much.

Relevant question to this text and the project is what the project's stance is wrt API stability? Merging fast means you're likely to ship something a little bit too quickly every now and then. I'm not saying it's a bad strategy, just wondering how you balance the tension between stability and velocity.

@alamb
Copy link
Contributor Author

alamb commented Jun 13, 2025

Relevant question to this text and the project is what the project's stance is wrt API stability? Merging fast means you're likely to ship something a little bit too quickly every now and then. I'm not saying it's a bad strategy, just wondering how you balance the tension between stability and velocity.

I would say we "try not to do API churn but it happens every release". Indeed it has come up as a challenge for downstream users, though I would say it has been less of a challenge the last 6 months or so. There is more detail here

The policy is documented here: https://datafusion.apache.org/contributor-guide/api-health.html

You can get a sense of the kinds of changes required by looking at https://datafusion.apache.org/library-user-guide/upgrading.html

Basically at some point I expect users of DataFusion will care enough about non breaking releases that they will want to contribute to helping make some release vehicle that has stable APIs (e.g. backport stuff to a LTS release for example)

But until that happens we just keep cranking on the code and change APIs every month

By "advancing the front" we always make tangible progress, and the strategy is
especially effective in a project that relies on individual contributors who may
not have the time or resources to invest in a large upfront design effort.
However, this "bit by bit approach" doesn't always succeed, and sometimes we get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, this "bit by bit approach" doesn't always succeed, and sometimes we get
However, this "bit by bit approach" doesn't always succeed, and sometimes we get

wondering if this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea behind this sentence is to acknowledge the tradeoffs inherent in "design / build" vs "big design all upfront" (it is this tension that actually sparked the original comment in the first place))

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
@comphead comphead merged commit 24f1bb5 into apache:main Jun 16, 2025
5 checks passed
@alamb alamb deleted the alamb/docs2222 branch June 18, 2025 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants