Skip to content

Fixed cropping when a world point is exactly on a pixel edge #874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ayshih
Copy link
Member

@ayshih ayshih commented Jul 8, 2025

This PR fixes NDCube cropping so that it no longer treats the lower edges of a pixel as belonging to that pixel. A world point exactly on a pixel edge does not immediately imply that any pixel touching that edge is included in the crop; it depends on the other world points specified for the crop.

TODO:

  • Update documentation
  • Add tests

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jul 8, 2025

Hi @ayshih. Thanks for explicitly laying out how you feel crop should work. To make sure I understand correctly, are the following correct and intended behaviours? (Note I'll use the notation 12A to denote a scalar Quantity or SpectralCoord of 12 Angstroms. And [10, 100) to denote an interval from and including 10 up to but excluding 100.)

Consider a 1-D NDCube with 9 array elements going over the interval [10, 100) Angstroms in steps of 10A.

  • cube.crop(12A, 57A) goes across [10, 60)A (slicing equivalent 0:6), so both points given to crop are inside the cropped cube.
  • cube.crop(16A, 54A) goes across [20, 50)A, (slicing equivalent 1:5), so neither input given to crop is in the cropped cube
  • cube.crop(17A, 23A) (slicing equivalent 1:1), is not allowed and raises an error.

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

I'm fixing the behavior only when the world point is exactly on a pixel edge. None of your examples have world points that are exactly on a pixel edge (a multiple of 10).

  • cube.crop(16A, 54A) goes across [20, 50)A, (slicing equivalent 1:5), so neither input given to crop is in the cropped cube

No, the crop should be [10, 60)A

  • cube.crop(17A, 23A) (slicing equivalent 1:1), is not allowed and raises an error.

No, the crop should be [10, 30)A

@DanRyanIrish
Copy link
Member

A world point exactly on a pixel edge does not immediately imply that any pixel touching that edge is included in the crop; it depends on the other world points specified for the crop.

Does this mean that the intended behaviour is that crop is undefined if all points are in pixel corners? (Based on the code changes it doesn't look like that's what you're doing, but is it what you intend?)

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

In your notation, we agree that:

  • cube.crop(11A, 19A) returns [10, 20)A
  • cube.crop(11A, 21A) returns [10, 30)A

Before this PR:

  • cube.crop(11A, 20A) returns [10, 30)A

After this PR:

  • cube.crop(11A, 20A) returns [10, 20)A

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

A world point exactly on a pixel edge does not immediately imply that any pixel touching that edge is included in the crop; it depends on the other world points specified for the crop.

Does this mean that the intended behaviour is that crop is undefined if all points are in pixel corners? (Based on the code changes it doesn't look like that's what you're doing, but is it what you intend?)

There is nothing special about pixel corners. The question is what pixels (not counting the pixel edges) are spanned between the specified world points. If there are pixels between the specified pixel corners (i.e., there is no pixel axis where they all have the same index), then it is a valid crop.

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

This PR does result in the following behavior, which some may find objectionable:

  • cube.crop(15A, 15A) returns [10, 20)A, because 15A is undeniably in that pixel
  • cube.crop(20A, 20A) errors, because 20A is not defined to be in any pixel

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jul 8, 2025

In your notation, we agree that:

  • cube.crop(11A, 19A) returns [10, 20)A

  • cube.crop(11A, 21A) returns [10, 30)A

Before this PR:

  • cube.crop(11A, 20A) returns [10, 30)A

After this PR:

  • cube.crop(11A, 20A) returns [10, 20)A

I need to think about this a bit more, but now I feel I better understand your vision and don't have the same level of objection as before.

@DanRyanIrish
Copy link
Member

This PR does result in the following behavior, which some may find objectionable:

  • cube.crop(20A, 20A) errors, because 20A is not defined to be in any pixel

That said, this does seem to be an unreasonable limitation of the definitions of the algorithm. If we were to move to this scheme, I think the most intuitive thing to do would be to special case this example so it returns a length-1 cube spanning the interval [20, 30)A.

There may be other issues with that though that I haven't thought through enough though.

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

I'll explicitly note that this approach for handling pixel edges when converting a range of world coordinates to a range of pixels is precisely how we do it in sunpy core for submapping and also for auto-extent reprojection. I don't see any reason why NDCube should behave differently.

  • cube.crop(20A, 20A) errors, because 20A is not defined to be in any pixel

That said, this does seem to be an unreasonable limitation of the definitions of the algorithm. If we were to move to this scheme, I think the most intuitive thing to do would be to special case this example so it returns a length-1 cube spanning the interval [20, 30)A.

I think returning an error is the Right Thing™ to do. Heck, I'm tempted to return an error for cube.crop(15A, 15A) because there's no space between the world coordinates, but at least there the pixel choice is unambiguous.

I do not think it should be special-cased. Consider a mirror image of the cube and issuing the same crop call. The same special-case logic would presumably return what is now the [20, 10)A interval, and thus the same crop call retrieves different pixels.

@ayshih
Copy link
Member Author

ayshih commented Jul 8, 2025

(I'll point out the above abstract examples use angstroms, but if you actually tried to run the examples in real code, it may be impossible to specify a world point that is exactly on a pixel edge, due to floating-point precision. That's why my example here uses meters.)

@DanRyanIrish
Copy link
Member

Heck, I'm tempted to return an error for cube.crop(15A, 15A) because there's no space between the world coordinates, but at least there the pixel choice is unambiguous.

Recall that current mental map for NDCube.crop is to return the smallest cube in array-space that includes the input points. It is not about giving crop ranges in world coordinates. This mental map was decided upon because we don't know how many world axes correspond to any given array axis, nor whether any of those axes are rotated relative to the pixel grid. Given that, the current NDCube.crop mental map is the only one that is simple and intuitive all possible scenarios, even if it's less intuitive in specific simple scenarios.

There are 2 separable issues raised by this PR:

  1. inputs considered points or intervals;
  2. whether pixels are considered closed-open or closed-closed intervals in world-space.

Regarding the first, as stated above, I think we should keep the input points API as it's unambiguous in all possible cubes and WCSs.

However, I am open to considering switching to a closed-closed pixel model, if it doesn't make currently valid NDCube.crop inputs invalid. This brings us to the error cases.

I do not think it should be special-cased. Consider a mirror image of the cube and issuing the same crop call. The same special-case logic would presumably return what is now the [20, 10)A interval, and thus the same crop call retrieves different pixels.

Agreed. If remember correctly, this was raised when we were trying to make crop work originally. The decision to consider array elements closed-open intervals in world space resolved this and other issues. The ambiguity you raise is a consequence of moving to a closed-closed interval model. That said, I don't think the specific case you mention is a problem. I think it's reasonable for the same crop call applied to a cube with a different WCS to return a different result.

More generally regarding the error, I think it's unreasonable for users to potentially have to think about the world values at specific pixel boundaries. After all, if they already know that, it's easier for them to use the array-slicing API. Therefore so long as there's at least one input point given within the FOV of the NDCube, it should return a valid cube. This would require a solution like the special-case I mentioned. If we can't support this or an equivalent solution, I would be against changing NDCube.crop from what we currently have.

@DanRyanIrish
Copy link
Member

(I'll point out the above abstract examples use angstroms, but if you actually tried to run the examples in real code, it may be impossible to specify a world point that is exactly on a pixel edge, due to floating-point precision. That's why my example here uses meters.)

I think this concern could be addressed with an edge_tolerance kwarg whereby if the resulting pixel coord is within, e.g. 1e-5, of the pixel edge, the input is considered to correspond to the edge.

@ayshih
Copy link
Member Author

ayshih commented Jul 9, 2025

Recall that current mental map for NDCube.crop is to return the smallest cube in array-space that includes the input points.

Okay, I'll adjust my phrasing. In your parlance, I want pixels to be considered "closed-closed" intervals for crop(). In other words, a world point on a pixel edge should not be treated as belonging to a specific pixel, but rather should be treated as contained by every pixel that shares that pixel edge.

For example, 20A is contained in both the [10, 20]A pixel and the [20, 30]A pixel, and the following are the minimal crops:

  • cube.crop(20A, 29A) should return [20, 30]A
  • cube.crop(11A, 20A) should return [10, 20]A (rather than [10, 30]A)

I think it's reasonable for the same crop call applied to a cube with a different WCS to return a different result.

While an arbitrarily different WCS is allowed to return different results, it's by considering mirror flips and rotations that we can evaluate biases in the output. Any biases could lead to unexpected results and user confusion. One should not choose biased output when there is an alternative – here, a tighter crop – that is stable to mirror flips and rotations.

For example:

  • cube.crop(20A, 30A) should return [20, 30]A (rather than [20, 40]A)
  • cube_reflected.crop(20A, 30A) should return [30, 20]A (rather than [30, 10]A)

Therefore so long as there's at least one input point given within the FOV of the NDCube, it should return a valid cube. This would require a solution like the special-case I mentioned.

I think the crux here is that I feel that it is okay for there to be ambiguity given certain sets of world points – e.g., there are two crops of the same size that contain all of the world points – and in those cases, it is okay for crop() to error. I'd prefer crop() to error when it's unclear what the user is asking for, rather than insisting on a certain (biased) solution that may be "wrong" as far as the user is concerned.

For example, I think cube.crop(20A, 20A) should error because both [10, 20]A and [20, 30]A are valid solutions, so the minimal crop is not well defined. Upon seeing the error message, the user can adjust the world points to assert their preference, e.g., either cube.crop(19A, 20A) or cube.crop(20A, 21A).

(I'll point out the above abstract examples use angstroms, but if you actually tried to run the examples in real code, it may be impossible to specify a world point that is exactly on a pixel edge, due to floating-point precision. That's why my example here uses meters.)

I think this concern could be addressed with an edge_tolerance kwarg whereby if the resulting pixel coord is within, e.g. 1e-5, of the pixel edge, the input is considered to correspond to the edge.

Yes, of course. I just wanted people to be aware that the above thought examples may not actually behave as advertised under this PR if using angstroms with wcslib.

@DanRyanIrish
Copy link
Member

I see the sense in what your saying and am open to your perspective. However,...

...and in those cases, it is okay for crop() to error. I'd prefer crop() to error when it's unclear what the user is asking for,...

This is a major stumbling block for me. I can imagine a case where a user might have a workflow where they are pulling the world coordinates of various features, e.g. contour boundaries, from some database, or from the output of a previous part of their pipeline, and want to crop a cube to that feature. In the case of a single feature study, I agree that the user can manually adjust one of their coordinates, but I feel this may unnecessarily annoy the user and/or lead them to think that ndcube is broken. In the case of a pipeline, this could become prohibitively annoying or cumbersome.

Ultimately, I see this limitation as making NDCube.crop less user-friendly and less robust.

rather than insisting on a certain (biased) solution that may be "wrong" as far as the user is concerned.

I'd draw a distinction here between "wrong" and "not what a user wants/finds intuitive". Wrong implies a bug or not self-consistent. But so long as a behaviour is well-defined and bug-free, I would argue it cannot be "wrong". But it may not be the same as some users' preference.

I think the same arguments you're making could be applied to the Python slicing API. Below my understanding of how Python indexing works: (Note that I am not referring to WCS pixel indexing where integer pixel indices refer to the centre of the pixel.)
In Python indexing, integer indices refer to the lower edge of the array element (see below)

| _ | _ | _ | _ |   # This line represents the array elements as _ and the edges of the elements as |
0   1   2   3   4   # This line represents the indices of the array elements. Note they are aligned with the edges of the elements.

And array elements are considered closed-open intervals, i.e. each index belongs only to the array element right-ward of it. Hence, Python slicing also works as an open-closed interval, i.e.

my_array[1:3]
| _ | _ |  # Note the output is length 2.
1   2   3

goes from the lower edge of the 1st (0-based counting) element to the lower edge of the 3rd element, i.e. it does not include the 3rd element. Some may argue that this is "wrong", as it some users will "obviously" want to include the 3rd element if they enter 3 in the slicing. (This was something I had to adjust to when I started using Python.) However, I don't consider this behaviour "wrong" as it's well defined and unambiguous on its own terms. It's simply a valid, if not unique, choice of definitions.

This means it solves various problems compared a closed-closed framework where array indices belong to both array elements it touches. For example, the above closed-open paradigm the following is allowed and unambiguous:

my_array[1]
_  # The contents of the 1st element, extracted from the array.

as it slices out the 1st array element. However, in your closed-closed paradigm, this would be undefined as it could refer to the 0th or 1st element. This is one reason why I feel that

cube.crop(20A)

should be a valid call and return the pixel right-ward of 20A.

I agree with you that handling world values on pixel edges is the crux of the issue. And unfortunately, for the above reasons, I think the current behaviour is better than your proposed behaviour because it is unambiguous and much closer to how Python slicing actually works.

That said, I am still open to adjusting the NDCube.crop behaviour so that if a world value that represents the upper/rightmost extent of the region of interest falls exactly on a pixel edge, that we do not include the upper/right-ward pixel associated with that world value. I think this represents a compromise between the current and your proposed behaviour and makes NDCube.crop actually behave slightly more like Python indexing than it already does.

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jul 10, 2025

TLDR of above comment

  • I am sympathetic to this approach, but I consider erroring on a pixel edge a major limitation.
  • Compromise solution: Keep current paradigm with the following change: exclude rightmost pixel if uppermost world value lands on a pixel edge. If all world values land on the same pixel edge, this is special cased so that the right-ward pixel is returned.

@ayshih
Copy link
Member Author

ayshih commented Jul 10, 2025

To me, you are conflating the operations of cropping and of sampling. Sampling a (valid) world point should always return something. I feel that it is appropriate and preferred for cropping to error if there is ambiguity.

I can imagine a case where a user might have a workflow where they are pulling the world coordinates of various features, e.g. contour boundaries, from some database, or from the output of a previous part of their pipeline, and want to crop a cube to that feature.

One could of course contemplate a wide range of possible sources/types of world coordinates. In your specific example of contour boundaries, I'd definitely prefer for the workflow to error if the contour boundary is a delta function along one pixel axis, because that's almost certainly an indication of a poorly defined contour boundary or an outright data error. If the cropping operation returns a cube regardless, I would be ignorant of the fact that the contour boundary doesn't have any width along one axis.

I think the current behaviour is better than your proposed behaviour because it is unambiguous and much closer to how Python slicing actually works.

Your comparison to Python slicing and accessing is improper. Python slicing is indeed akin to a cropping operation:

>>> my_array = np.arange(10)
>>> my_array[1:3]
array([1, 2])

Given that, the correct analogy for cube.crop(20A) is slicing to a delta function:

>>> my_array[1:1]
array([], dtype=int64)

Python accessing is different, akin to sampling rather than cropping. Given that there is no notion of fractional indices, you can't draw a tight analogy to WCS parlance.

@ayshih
Copy link
Member Author

ayshih commented Jul 10, 2025

  • Compromise solution: Keep current paradigm with the following change: exclude rightmost pixel if uppermost world value lands on a pixel edge. If all world values land on the same pixel edge, this is special cased so that the right-ward pixel is returned.

On that last point, how about instead of my insistence that an error be raised, crop() emits a warning about the ambiguity? That way, crop() successfully returns a result that will probably be adequate for the majority of users, and the minority of users will be notified that they may want to adjust their world points.

@DanRyanIrish
Copy link
Member

I think a warning in the case that all world coordinates corresponding to a single pixel edge for a given array axis/axes, is a good idea.

@ayshih
Copy link
Member Author

ayshih commented Jul 11, 2025

I've revised this PR to emit a warning instead of raise an error for the edge case (ha ha)

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jul 11, 2025

Hi @ayshih. Thanks for updating the PR. At first glance, the changes you made look correct. That said, I found it hard to decipher what each variable was because some of the variables now have pixel indices stored in array order. I've made a PR to your branch with changes that make the get_crop_item_from_points function work explicitly in pixel space (including pixel axis ordering) until it comes time to calculate the min and max array indices for each array axis. This also allows a slight simplification of some of the code higher up in the function.

While I was at it, I also shifted where the warning is raised so that a separate one is issued for each axis. This allows more explicit info in the warning that I think will help users. Technically, this means it's possible that multiple warnings can be raised for one crop operation. However, this should be almost vanishingly small given the chances that any, let alone all, world values corresponds exactly to the same pixel edge, and the still smaller chance that this happens for multiple axes.

However, if we don't want to risk even this chance, we could compile the warning message axis by axis as I've done, but issue the warning after iterating through all the axes using the ambiguous variable, as you've done.

@DanRyanIrish
Copy link
Member

On second glance, you were raising a warning whenever the min and max pixel coord was in the same pixel, not just the same pixel edge. I've fixed with in the PR to your branch.

@ayshih
Copy link
Member Author

ayshih commented Jul 11, 2025

While I was at it, I also shifted where the warning is raised so that a separate one is issued for each axis. This allows more explicit info in the warning that I think will help users. Technically, this means it's possible that multiple warnings can be raised for one crop operation.

I implemented it as a single warning covering all axes because with default warning settings, I believe only the first warning will get emitted.

On second glance, you were raising a warning whenever the min and max pixel coord was in the same pixel, not just the same pixel edge.

Nope. If the world points are in the same pixel not at the edges, max will be min +1 because of how they are calculated.

@DanRyanIrish
Copy link
Member

I implemented it as a single warning covering all axes because with default warning settings, I believe only the first warning will get emitted.

Ah I see. In that case we should build up the warning message axis by axis to give the user the clearest information and then raise it as one warning like you did originally.

Nope. If the world points are in the same pixel not at the edges, max will be min +1 because of how they are calculated.

I stand corrected :) I'll revert that change in my PR.

…ray index space.

This is preferred so that it can be determined whether the maximum input world coordinate
corresponds to a pixel edge. In this case, the rightward pixel can be excluded from the
cropped cube.
This commit raises a warning for each axis when the input points
all correspond to the same pixel edge. This allows more explicit
information in the warning and less code.
…array axis correspond to pixel edge.

Previously it was raised if all coords for the array axis were in the same pixel.
…revious implementation.

Previous implementation was correct and is simpler.  Also, raise warning of whether
all world values land on a single pixel edge once, but build message on an axis by axis
basis.
# If item will result in a scalar cube, raise an error as this is not currently supported.
if result_is_scalar:
raise ValueError("Input points causes cube to be cropped to a single pixel. "
"This is not supported when keepdims=False.")
"This is not supported when keepdims=False. "
"Please set keepdims=True and try again.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that adding this statement is a bit silly. I feel that it is already implied by the preceding sentence, and it implicitly assumes that the user intended to crop to a single pixel, when it may actually be in error (e.g., specifying only a single world point)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants