Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop the length from numpy's fixed-width string dtypes #9586

Merged
merged 8 commits into from
Oct 24, 2024

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Oct 6, 2024

By converting arrays of fixed-width string / bytes dtypes to their base dtype (np.str_ and np.bytes_) in np.result_type, we can avoid accidentally truncating the replacement strings in xr.where.

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@shoyer
Copy link
Member

shoyer commented Oct 10, 2024

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Yes, this would be better in my opinion!

@keewis
Copy link
Collaborator Author

keewis commented Oct 10, 2024

how do we proceed, then? Merge this (after fixing the failing min-deps CI), ask if numpy.result_type can be changed, and remove it once we can require a version of numpy that does this for us?

@shoyer
Copy link
Member

shoyer commented Oct 10, 2024 via email

@keewis
Copy link
Collaborator Author

keewis commented Oct 15, 2024

Yes, this would be better in my opinion!

There's some concerns about this in numpy/numpy#27546

@keewis
Copy link
Collaborator Author

keewis commented Oct 24, 2024

@TomNicholas, should we merge this before the release?

@TomNicholas
Copy link
Member

Sure! If there is any doubt then leave it, but Stephan reviewed it so I say just merge.

@keewis
Copy link
Collaborator Author

keewis commented Oct 24, 2024

the only doubt is about what should happen upstream in numpy (if anything should happen at all), so that shouldn't block us here

@TomNicholas
Copy link
Member

I agree, let's merge.

@TomNicholas TomNicholas enabled auto-merge (squash) October 24, 2024 21:14
@TomNicholas TomNicholas merged commit fbe73ef into pydata:main Oct 24, 2024
28 checks passed
@keewis keewis deleted the fws-length branch October 24, 2024 21:21
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2024
* main:
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 3, 2024
* main: (85 commits)
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataArray.where() can truncate strings with <U dtypes
3 participants