-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support or default to less detailed datetime64 #7307
Comments
http://pandas-docs.github.io/pandas-docs-travis/gotchas.html#timestamp-limitations simply use periods and use NaT as appropriate for missing values |
it's a more common issue than believed I bump into this issue more often than not my guess is that at least 99% of pandas users dont need nanoseconds if this is true, nano seconds usage is a particular case, not the rule I really don't understand the rationale behind all this nano planning But I dont understand how hard it would be to change to microseconds, either |
you can simply use a |
@jreback what about nanoseconds? isnt it the same? i feel forced to conform to whoever considered nanoseconds is best (which is really not practical for most user cases) anyway, nanosecond unit receives a big -1 from me |
But how can we read a csv file for instance and convert dates to periods? Sincere question, as I'm really struggling with this and the documentation is really unclear here. Because there is
See also this example:
i cannot make date_to_3 into a period (as far as i know) and while being a perfectly nice np.datetime64 (with unit='s' instead of unit='ns'), Pandas refuses to see it as such. |
Read them in as ints (you don't really need to do anything special to do this), just don't specify 'parse_dates'
Define a converter that creates a period from an int
Of course this could be vectorized / more natural, e.g. a Give it a go! |
Thanks! It works perfectly like this, I will test HDF5 saving & queries this week too. I would really add this snippet to http://pandas.pydata.org/pandas-docs/stable/timeseries.html (and yes, to_periods would be a really great idea :) |
ok, why don't you open a feature request for |
I've put it on my to do list for later this week (and will unscrupulously copy from to_datetime with the additional Period frequency formats)! |
@jreback I know PeriodIndex is the suggested work around for dates that don't fit in ns precision, but there is a difference between periods and datetimes for meaning as well -- periods refer to a periods of time rather than a time point. For what it's worth, I've played a bit with I do recognize that this may be too late to change now but I do think this is something worth considering. I suspect there are a lot more users who would be happy with a fixed choice of "us" rather than "ns" precision. Of course, it would also be a lot of work to actually do the implementation (and it's not the top priority for me). |
I think this would have to be a hybrid, e.g. carry around the units on the I think it IS possible, but prob a bit of work. |
Reopening this -- this is still a recurrent issue, despite the work around of using PeriodIndex. |
👍 From me. The date range limitation is a HUGE issue for many scientific communities. I find it very hard to believe that there are more pandas users who want nanoseconds than who want to use dates before 1678. |
As I have stated many times, this would require some major effort. It IS certainly possible, and the extension dtypes are able to support this. But would really need to be spearheaded by someone who this would be very useful. |
Hi @jreback! I definitely understand as this https://github.com/pydata/pandas/search?q=ns gives 150 matches!
Can you give some guidance as to what should be the best route for this and what kind of requirements you would have? P.s. because of HDF5 and Bcolz we couldn't switch to time periods, so we still have this issue and have lots of catch procedures in place to work around it, so solving this would be great for me personally. |
Thanks for sharing your perspective, @wesm. I'm pretty much in complete agreement with you. |
As a followup on @wesm comments. Recently (thanks to @sinhrks , @MaximilianR ) these are becoming more and more of a first class type. Certainly, this introduces more user mental effort that using a single less-detailed and in fact these can represent really long ranges
|
Hi, I understand that inside Pandas the Period works really well. But in terms of IO to/from Pandas I'm less sure, typical use cases for me are:
I think all of these default to numpy datetimes (which in itself does support it, but is overruled by the standard ns casting); so there's lots of uncertainties there for me. |
Following what @wesm said
This was on my mind since the beginning of this issue. Too bad that only few people voted on pydata google group. |
Probably the most likely people to find this page (or the poll) and speak are those who are dissatisfied with nanoseconds. But there are those of us who are happy with nanoseconds. I am. Here are some uses of precision finer than 1 microsecond:
In the unlikely event that Pandas switched from always nanoseconds to always microseconds, we would have to stop using its time features, and store nanoseconds as raw int64. Even if we stipulate that nanosecond precision has no practical use to humans, we need to be able to convey timestamps with full accuracy between systems (e.g. to do database queries). I do sympathize with those whose timestamp needs exceed the life expectancy of people or nations, but systems are generating more sub-microsecond timestamps, and this trend will not reverse. Adding support for longer horizons is good, but we shouldn't lose our nanos. |
In 2.0 we support "ns", "us", "ms", and "s". Closing as complete. |
In case anyone else gets to the bottom of this thread: |
Hi,
I regularly run into issues where I have dates that fall outside of Pandas's datetime standards. Quite a few data sources have defaults such as "9999-12-31" and stuff like that, leading to issues in pandas.
This is because Pandas defaults to nanoseconds where the time span is quite limited.
See: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html
Code Meaning Time span (relative) Time span (absolute)
s second +/- 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]
ms millisecond +/- 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]
us microsecond +/- 2.9e6 years [290301 BC, 294241 AD]
ns nanosecond +/- 292 years [ 1678 AD, 2262 AD]
I first thought it was the unit='s' parameter in to_datetime would work (see: http://pandas.pydata.org/pandas-docs/version/0.14.0/generated/pandas.tseries.tools.to_datetime.html), but this is only for translating a different datetime to nano seconds (I think) and the "ns" detail level seems to be rather hard coded.
I cannot imagine the majority of the use cases needing nano seconds; even going to micro seconds extends the date range to something that in my experience should always work. The nanosecond 2262AD is really limited.
Imho, ideally one should be able to choose the detail level. Is this just me or is this a common issue?
The text was updated successfully, but these errors were encountered: