Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArrowError::ArithmeticOverflow #5845

Closed

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

Downstream crates (such as DataFusion Comet) often need specific handling for overflow errors and we currently have no good way to detect them other than looking for specific words in error messages, which is not very robust. For example, we currently have code like this:

match self.inner_abs_func.invoke(args) {
    Err(DataFusionError::ArrowError(ArrowError::ComputeError(msg), trace)) if msg.contains("overflow") => { ... }

It would be better if Arrow had an ArithmeticOverflow variant in the ArrowError enum, similar to the existing DivideByZero variant.

What changes are included in this PR?

Add ArithmeticOverflow and update existing overflow handling.

Are there any user-facing changes?

Yes, overflows will produce a different error now.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 5, 2024
@andygrove andygrove added the api-change Changes to the arrow API label Jun 5, 2024
@andygrove andygrove requested a review from viirya June 5, 2024 13:43
@andygrove andygrove requested a review from alamb June 5, 2024 13:57
@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2024

This makes sense to me, we should possibly make ArrowError non_exhaustive to avoid adding new variants being a breaking in future.

Unfortunately this has now missed the release and so will have to wait until the next breaking release which will be in 3 months time...

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @andygrove -- this makes sense to me, but is technically a breaking change from a sem-ver perspective

So according to the docs here
https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule we need to wait after August to include the next major release

There is extensive discussion of the various perspectives on #5368

One potential thing we could do is merge this PR in to master and create a 52.0.0_maintenance maintenance branch that we'll release 52.1.0 and 52.2.0 from. Then we can backport any fixes needed to 52.0.0_maintenance and keep development churning along on master

cc @tustvold for your perspective

@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2024

Then we can backport any fixes needed to 52.0.0_maintenance and keep development churning along on master

I think I would rather avoid reaching for git-flow style branch machinery until we have either:

  • A non-trivial number of pending breaking changes
  • Dependent / conflicting breaking changes that need to be applied in a particular order

Otherwise I think we're just creating work for ourselves, especially as we'll want to bring everything bar breaking changes onto 52.0.0_maintenance for the patch releases.

I would therefore propose we mark this as a draft, and then look to bring it in once either the above criteria occur, or it is time for the breaking release?

@alamb
Copy link
Contributor

alamb commented Jun 5, 2024

Otherwise I think we're just creating work for ourselves, especially as we'll want to bring everything bar breaking changes onto 52.0.0_maintenance for the patch releases.

" especially as we'll want to bring everything" -- I don't believe we have to do this. We could also consider only backporting things that the contributors want to backport and/or we deem is important (e.g. bug fixes). Otherwise we would just continue to develop on main as we have done in the past

I would therefore propose we mark this as a draft, and then look to bring it in once either the above criteria occur, or it is time for the breaking release?

This is also a valid approach. One issue I see with this is that we will end up with a thundering herd / PR merge / conflict storm that happens as soon as we start merging the accumulated breaking API changes for the last few months.

@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2024

I had understood we were going to do monthly minor releases, i.e. not just patch releases of back-ported fixes? I presume in your proposed model these would have to come from the maintenance branch, which implies mirroring most commits across

One issue I see with this is that we will end up with a thundering herd / PR merge / conflict storm that happens as soon as we start merging the accumulated breaking API changes for the last few months.

Yes, if we get a large number of dependent breaking changes we will need to revisit this, perhaps creating a 53.0.0 branch or something, but I also think it is right to push this burden onto the people proposing the breaking change

@alamb
Copy link
Contributor

alamb commented Jun 5, 2024

I had understood we were going to do monthly minor releases, i.e. not just patch releases of back-ported fixes? I presume in your proposed model these would have to come from the maintenance branch, which implies mirroring most commits across

I think the key question is "what do we want in the monthly releases" and thus "what is the criteria to include code there"

We could certainly adopt a "release everything that is non breaking" type model which would require the mirrord commits you mention.

We could also adopt a "release only things that people want/need enough to go through the effort to backport" model

I would personally selfishly likely go with the latter as it would reduce overall maintainer burden (and pushes the onus on the people who want the changes in the non breaking monthly releases to back port things)

This is probably worth its own ticket I now realize. I will file one tomorrow

@tustvold
Copy link
Contributor

tustvold commented Jun 5, 2024

I would personally selfishly likely go with the latter as it would reduce overall maintainer burden

If you adopt the branching model you propose then yes, if you don't then nothing changes for non-breaking changes. I feel fairly confident the model that doesn't require additional ceremony for non-breaking changes is what people are more likely to expect/want. If I implement a non-breaking feature addition / performance improvement I would not expect to have to wait 3 months, or pester a maintainer for it to be usable. On the flip side if I want to make a breaking change I would not expect that to be available imminently and I would expect getting it integrated to be more involved. We should be optimising for non-breaking changes not breaking changes.

A ticket is worthwhile given you appear to feel differently on this

@alamb
Copy link
Contributor

alamb commented Jun 5, 2024

I feel fairly confident the model that doesn't require additional ceremony for non-breaking changes is what people are more likely to expect/want

I agree. That doesn't me the maintainers have to do it for them.

If I implement a non-breaking feature addition / performance improvement I would not expect to have to wait 3 months, or pester a maintainer for it to be usable.

My idea is that there is no pestering required -- the person would simply create the cherry-pick themselves. I will file a ticket tomorrow

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

I filed #5907 to track the question on how to do releases and what to do with PRs such as this

@tustvold tustvold marked this pull request as draft June 26, 2024 10:18
@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Jul 2, 2024
@alamb alamb changed the base branch from master to 53.0.0-dev July 17, 2024 20:07
@alamb
Copy link
Contributor

alamb commented Jul 17, 2024

Per discussion on #6050 I retargeted this PR tothe 53.0.0-dev feature branch, where we are collecting PRs with breaking changes before master opens

@alamb alamb force-pushed the arithmetic-overflow-error branch from b14e468 to 6fd1628 Compare July 17, 2024 20:13
@alamb
Copy link
Contributor

alamb commented Jul 17, 2024

I rebased this branch to target the 53.0.0 dev branch but somehow messed up the commit author. @andygrove can you double check that this PR still looks good to you (and/or fix the author?)

@alamb alamb deleted the branch apache:53.0.0-dev July 26, 2024 10:11
@alamb alamb closed this Jul 26, 2024
@alamb
Copy link
Contributor

alamb commented Jul 26, 2024

I merged the 53 dev branch and that seems to have closed this PR -- any chance you can reopen and retarget main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants