drop the length from `numpy`'s fixed-width string dtypes #9586

keewis · 2024-10-06T13:29:00Z

By converting arrays of fixed-width string / bytes dtypes to their base dtype (np.str_ and np.bytes_) in np.result_type, we can avoid accidentally truncating the replacement strings in xr.where.

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Closes DataArray.where() can truncate strings with <U dtypes #9180
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

shoyer

Looks good, thanks!

shoyer · 2024-10-10T09:34:33Z

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Yes, this would be better in my opinion!

keewis · 2024-10-10T09:51:12Z

how do we proceed, then? Merge this (after fixing the failing min-deps CI), ask if numpy.result_type can be changed, and remove it once we can require a version of numpy that does this for us?

shoyer · 2024-10-10T13:19:45Z

Yes, that’s probably the way to go

…

On Thu, Oct 10, 2024 at 6:51 PM Justus Magin ***@***.***> wrote: how do we proceed, then? Merge this (after fixing the failing min-deps CI), ask if numpy.result_type can be changed, and remove it once we can require a version of numpy that supports this? — Reply to this email directly, view it on GitHub <#9586 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVSTJCPVMIHB7YF2QKTZ2ZE2LAVCNFSM6AAAAABPOKUAHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBUGYZDIMZVGE> . You are receiving this because you commented.Message ID: ***@***.***>

keewis · 2024-10-15T12:35:05Z

Yes, this would be better in my opinion!

There's some concerns about this in numpy/numpy#27546

keewis · 2024-10-24T21:03:02Z

@TomNicholas, should we merge this before the release?

TomNicholas · 2024-10-24T21:05:00Z

Sure! If there is any doubt then leave it, but Stephan reviewed it so I say just merge.

keewis · 2024-10-24T21:07:09Z

the only doubt is about what should happen upstream in numpy (if anything should happen at all), so that shouldn't block us here

TomNicholas · 2024-10-24T21:08:09Z

I agree, let's merge.

* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)

* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...

keewis added 2 commits October 6, 2024 15:20

check that the length of fixed-width numpy strings is reset

1a0e56f

drop the length from numpy's fixed-width string dtypes

ed9e1b8

shoyer approved these changes Oct 10, 2024

View reviewed changes

keewis added 4 commits October 10, 2024 17:00

compatibility with numpy<2

4d8dcb0

use issubdtype instead

0faec84

some more test cases

a6dffe0

more details in the comment

d163934

Merge branch 'main' into fws-length

e15937f

Merge branch 'main' into fws-length

6213be1

TomNicholas enabled auto-merge (squash) October 24, 2024 21:14

TomNicholas merged commit fbe73ef into pydata:main Oct 24, 2024
28 checks passed

keewis deleted the fws-length branch October 24, 2024 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

drop the length from `numpy`'s fixed-width string dtypes #9586

drop the length from `numpy`'s fixed-width string dtypes #9586

Uh oh!

keewis commented Oct 6, 2024

Uh oh!

shoyer left a comment

Uh oh!

shoyer commented Oct 10, 2024

Uh oh!

keewis commented Oct 10, 2024 •

edited

Loading

Uh oh!

shoyer commented Oct 10, 2024 via email

Uh oh!

keewis commented Oct 15, 2024

Uh oh!

keewis commented Oct 24, 2024

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

keewis commented Oct 24, 2024 •

edited

Loading

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

drop the length from numpy's fixed-width string dtypes #9586

drop the length from numpy's fixed-width string dtypes #9586

Uh oh!

Conversation

keewis commented Oct 6, 2024

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

shoyer commented Oct 10, 2024

Uh oh!

keewis commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Oct 10, 2024 via email

Uh oh!

keewis commented Oct 15, 2024

Uh oh!

keewis commented Oct 24, 2024

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

keewis commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drop the length from `numpy`'s fixed-width string dtypes #9586

drop the length from `numpy`'s fixed-width string dtypes #9586

keewis commented Oct 10, 2024 •

edited

Loading

keewis commented Oct 24, 2024 •

edited

Loading