Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast datetime and timedelta to signed 64-bit int #127

Merged
merged 6 commits into from
Nov 23, 2018
Merged

Cast datetime and timedelta to signed 64-bit int #127

merged 6 commits into from
Nov 23, 2018

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Nov 21, 2018

Related to issue ( zarr-developers/zarr-python#342 )

The NaT type is represented as -0. As a result, casting to an unsigned integral fails and throws an error. However casting to a signed integral type does not have this problem and proceeds without issues.

TODO:

  • Unit tests and/or doctests in docstrings
  • tox -e py37 passes locally
  • tox -e py27 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • tox -e docs passes locally
  • AppVeyor and Travis CI passes
  • Test coverage to 100% (Coveralls passes)

The `NaT` type is represented as `-0`. As a result, casting to an
unsigned integral fails and throws an error. However casting to a signed
integral type does not have this problem and proceeds without issues.
Make sure that we are able to encode NaT values without issues by
introducing `-0`s in our test data.
@alimanfoo
Copy link
Member

Hi @jakirkham, this looks good, but there is an issue to do with how the backwards compatibility tests are setup, I'll try to explain. Basically the list of arrays in the test module for each codec can be added to, but existing arrays in that list shouldn't be changed. If they are changed, then they will be out of sync with the fixture data. This won't show up in the tests, as previous test arrays get saved to files, so the backwards compatibility tests run off the previously saved data.

In other words, if you change any of the arrays in the list, then the corresponding fixture data should also be deleted and regenerated. You can find the data to delete because the order in which the arrays appear in the list is used to generate a file name. E.g., within the test_zlib module, the first array in the list gets saved as fixture/zlib/array.00.npy, and the corresponding encoded data from the first codec in the codecs list gets saved as fixture/zlib/codec.00/encoded.00.dat. I.e., if you changed the first array in the list, then you should delete fixture/zlib/array.00.npy and all files like fixture/zlib/*/encoded.00.dat.

Alternatively and probably better in this case, leave all the existing arrays as-is, and append some new arrays to the list, in which case some new fixture data will get generated on the first test run.

Hope that makes sense!

This restores the old datetime/timedelta tests in their same position in
the parameterized compression tests. Should ensure their fixtures stay
the same. Also should make sure we haven't accidentally broken backwards
compatibility by including the original tests. The new tests with NaT
values are included afterwards. This will make sure that new fixture
data is generated for them.
Commits the fixture data automatically generated by pytest for the new
datetime/timedelta NaT tests.
@jakirkham
Copy link
Member Author

Thanks @alimanfoo. IIUC you would like the tests added as a new set in the parameterized group. If so that makes sense to me, sorry I didn't think about it earlier. Have added the NaT tests as new tests at the end of the parameterized tests, generated the new fixture data locally (via running pytest), and committed/pushed. Please let me know if that works or if more needs to be done.

@@ -47,6 +47,10 @@
np.random.randint(0, 2**60, size=1000, dtype='u8').view('m8[ns]'),
np.random.randint(0, 2**25, size=1000, dtype='u8').view('M8[m]'),
np.random.randint(0, 2**25, size=1000, dtype='u8').view('m8[m]'),
(-np.random.randint(-2**30, 2**30, size=1000, dtype='i8')).view('M8[ns]'),
Copy link
Member

@alimanfoo alimanfoo Nov 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, if this is random numbers between -2**30 and 2**30, why do we need to take the negative of the array? Is it not enough to specify the limits within randint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This wasn't correct.

Basically NaT in float64 is the same as -0.0. However int64 lacks a -0 equivalent, but -2**63 maps to the same value, which is what we want to include.

PR ( #131 ) fixes this.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good. Just had a question about why take negative of the arrays, for my own understanding.

@alimanfoo
Copy link
Member

Going to merge this one so we can rebase other PRs (#121, #128).

@alimanfoo alimanfoo merged commit eaadc29 into zarr-developers:master Nov 23, 2018
@jakirkham jakirkham deleted the hdl_nat branch November 23, 2018 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants