Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 support #53

Closed
shoyer opened this issue Mar 7, 2014 · 4 comments · Fixed by #124
Closed

Python 3 support #53

shoyer opened this issue Mar 7, 2014 · 4 comments · Fixed by #124
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Mar 7, 2014

This is unlikely to be difficult since all of our dependencies support Python 3, but it will definitely take some work.

@takluyver
Copy link
Member

Work in progress: https://github.com/takluyver/xray/tree/py3

@takluyver
Copy link
Member

Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.

The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.

Options:

  • Decode all string data to unicode on load. This is consistent with what netCDF already does.
  • Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.
  • Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.

I'm not familiar enough with netCDF and how it's used to know what makes sense here.

@shoyer
Copy link
Member Author

shoyer commented May 6, 2014

First of all, thank you for tackling this!

Your first suggestion (decoding all string data to unicode) sounds like the right choice to me. Ideally this can be done in a lazy fashion (without needing to load all array data from disk when opening a file), but honestly I'm not too concerned about NetCDF3 performance for partially loading files from disk with SciPy library, given that NetCDF3 are already limited to be smaller than 2GB.

Let me give you a little bit of context:

The SciPy NetCDF module only works with an obsolete file format (NetCDF3; the current version, based on HDF5, is NetCDF4). The main reason we support it is because it serves as a (somewhat non-ideal) wire format, because SciPy can read and write file-like objects without files actually existing on disk, which is not possible with the NetCDF4 library.

@shoyer
Copy link
Member Author

shoyer commented May 6, 2014

A bit more context: NetCDF3 (as a file format), which is all that scipy supports, doesn't support Unicode or 64 bit numbers. It really is a relic.

On Mon, May 5, 2014 at 6:17 PM, Thomas Kluyver notifications@github.com
wrote:

Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.
The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.
Options:

  • Decode all string data to unicode on load. This is consistent with what netCDF already does.
  • Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.
  • Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
    I'm not familiar enough with netCDF and how it's used to know what makes sense here.

    Reply to this email directly or view it on GitHub:
    https://github.com/akleeman/xray/issues/53#issuecomment-42258925

@shoyer shoyer added this to the 0.2 milestone May 6, 2014
@shoyer shoyer modified the milestones: 0.1.1, 0.2 May 20, 2014
keewis pushed a commit to keewis/xarray that referenced this issue Jan 17, 2024
* added assert functions

* pseudocode for printing differences between trees - need tests

* equals->identical fix

* refactored to use same diff function for asserts and to check isomorphism internally

* added tests of tree diff formatting

* added option to check trees from the root

* fix bugs with assert functions

* convert tests to use new assert_equal function for tree comparisons

* linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants