Python 3 support #53

shoyer · 2014-03-07T06:14:55Z

This is unlikely to be difficult since all of our dependencies support Python 3, but it will definitely take some work.

takluyver · 2014-05-05T19:43:42Z

Work in progress: https://github.com/takluyver/xray/tree/py3

takluyver · 2014-05-06T01:17:16Z

Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.

The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.

Options:

Decode all string data to unicode on load. This is consistent with what netCDF already does.
Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.
Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.

I'm not familiar enough with netCDF and how it's used to know what makes sense here.

shoyer · 2014-05-06T02:22:48Z

First of all, thank you for tackling this!

Your first suggestion (decoding all string data to unicode) sounds like the right choice to me. Ideally this can be done in a lazy fashion (without needing to load all array data from disk when opening a file), but honestly I'm not too concerned about NetCDF3 performance for partially loading files from disk with SciPy library, given that NetCDF3 are already limited to be smaller than 2GB.

Let me give you a little bit of context:

The SciPy NetCDF module only works with an obsolete file format (NetCDF3; the current version, based on HDF5, is NetCDF4). The main reason we support it is because it serves as a (somewhat non-ideal) wire format, because SciPy can read and write file-like objects without files actually existing on disk, which is not possible with the NetCDF4 library.

shoyer · 2014-05-06T02:37:31Z

A bit more context: NetCDF3 (as a file format), which is all that scipy supports, doesn't support Unicode or 64 bit numbers. It really is a relic.

On Mon, May 5, 2014 at 6:17 PM, Thomas Kluyver notifications@github.com
wrote:

Outstanding issue: What to do with unicode data (the native str type on Python 3 is unicode). The SciPy netcdf module doesn't attempt to handle unicode, so I wrote some code to encode unicode to bytes before storing it in scipy. That means, however, that the roundtrip doesn't work, because loading the data again gives bytes, not str.
The netCDF4 module appears to handle this by just decoding any string data it finds to unicode, without considering what it's intended to be.
Options:

Decode all string data to unicode on load. This is consistent with what netCDF already does.

Store the fact that the data was encoded in an attribute in the file. If that attribute is present, decode it again on load.

Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
I'm not familiar enough with netCDF and how it's used to know what makes sense here.
Reply to this email directly or view it on GitHub:
https://github.com/akleeman/xray/issues/53#issuecomment-42258925

* added assert functions * pseudocode for printing differences between trees - need tests * equals->identical fix * refactored to use same diff function for asserts and to check isomorphism internally * added tests of tree diff formatting * added option to check trees from the root * fix bugs with assert functions * convert tests to use new assert_equal function for tree comparisons * linting

shoyer added the enhancement label Mar 7, 2014

shoyer added this to the 0.2 milestone May 6, 2014

shoyer mentioned this issue May 12, 2014

Complete Python 3 support #124

Merged

shoyer closed this as completed in #124 May 12, 2014

shoyer modified the milestones: 0.1.1, 0.2 May 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3 support #53

Python 3 support #53

shoyer commented Mar 7, 2014

takluyver commented May 5, 2014

takluyver commented May 6, 2014

shoyer commented May 6, 2014

shoyer commented May 6, 2014

Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
I'm not familiar enough with netCDF and how it's used to know what makes sense here.

Python 3 support #53

Python 3 support #53

Comments

shoyer commented Mar 7, 2014

takluyver commented May 5, 2014

takluyver commented May 6, 2014

shoyer commented May 6, 2014

shoyer commented May 6, 2014

Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes. I'm not familiar enough with netCDF and how it's used to know what makes sense here.

Don't decode any data, and change all the scipy roundtrip tests to expect that data comes back as bytes.
I'm not familiar enough with netCDF and how it's used to know what makes sense here.