Skip to content

Commit

Permalink
Merge branch '26-remove-datetime-limits'
Browse files Browse the repository at this point in the history
  • Loading branch information
jmurty committed May 31, 2018
2 parents 413f56c + dd3d082 commit bd24338
Show file tree
Hide file tree
Showing 9 changed files with 986 additions and 122 deletions.
101 changes: 68 additions & 33 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ To use

# Derive Python date objects
# lower and upper bounds that strictly adhere to the given range
>>> e.lower_strict(), e.upper_strict()
(datetime.date(1979, 8, 1), datetime.date(1979, 8, 31))
>>> e.lower_strict()[:3], e.upper_strict()[:3]
((1979, 8, 1), (1979, 8, 31))
# lower and upper bounds that are padded if there's indicated uncertainty
>>> e.lower_fuzzy(), e.upper_fuzzy()
(datetime.date(1979, 7, 1), datetime.date(1979, 9, 30))
>>> e.lower_fuzzy()[:3], e.upper_fuzzy()[:3]
((1979, 7, 1), (1979, 9, 30))

# Date intervals
>>> interval = parse_edtf("1979-08~/open")
Expand All @@ -50,9 +50,9 @@ To use
# Intervals have lower and upper EDTF objects.
>>> interval.lower, interval.upper
(UncertainOrApproximate: '1979-08~', UncertainOrApproximate: 'open')
>>> interval.lower.upper_strict()
datetime.date(1979, 8, 31)
>>> interval.upper.lower_strict() #'open' is interpreted to mean 'still happening'.
>>> interval.lower.upper_strict()[:3]
(1979, 8, 31)
>>> interval.upper.lower_strict() # 'open' is interpreted to mean 'still happening'.
[Today's date]

# Date collections
Expand Down Expand Up @@ -296,6 +296,31 @@ few different Python dates, depending on the circumstance. Generally, Python
dates are used for sorting and filtering, and are not displayed directly to
users.


``struct_time`` date representation
-----------------------------------

Because Python's ``datetime`` module does not support dates out side the range
1 AD to 9999 AD we return dates as `time.struct_time` objects by default
instead of the ``datetime.date`` or ``datetime.datetime`` objects you might
expect.

The ``struct_time`` representation is more difficult to work with, but can be
sorted as-is which is the primary use-case, and can be converted relatively
easily to ``date`` or ``datetime`` objects (provided the year is within 1 to
9999 AD) or to date objects in more flexible libraries like
`astropy.time <http://docs.astropy.org/en/stable/time/index.html>`_
for years outside these bounds.

If you are sure you are working with dates within the range supported by
Python's ``datetime`` module, you can get these more convenient objects using
the ``edtf.struct_time_to_date`` and ``edtf.struct_time_to_datetime``
functions.

NOTE: This library previously did return ``date`` and ``datetime`` objects
from methods by default before we switched to ``struct_time``. See ticket
`<https://github.com/ixc/python-edtf/issues/26>`_.

``lower_strict`` and ``upper_strict``
-------------------------------------

Expand All @@ -308,9 +333,21 @@ natural sort order. In a descending sort (most recent first), sort by
``upper_strict``::

>>> e = parse_edtf('1912-04~')
>>> e.lower_strict()

>>> e.lower_strict() # Returns struct_time
>>> time.struct_time(tm_year=1912, tm_mon=4, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)

>>> e.lower_strict()[:3] # Show only interesting parts of struct_time
(1912, 4, 01)

>>> from edtf import struct_time_to_date
>>> struct_time_to_date(e.lower_strict()) # Convert to date
datetime.date(1912, 4, 01)
>>> e.upper_strict()

>>> e.upper_strict()[:3]
(1912, 4, 30)

>>> struct_time_to_date(e.upper_strict())
datetime.date(1912, 4, 30)

``lower_fuzzy`` and ``upper_fuzzy``
Expand All @@ -330,33 +367,23 @@ is, if a date is approximate at the month scale, it is padded by a month. If
it is approximate at the year scale, it is padded by a year::

>>> e = parse_edtf('1912-04~')
>>> e.lower_fuzzy() # padding is 100% of a month
datetime.date(1912, 3, 1)
>>> e.upper_fuzzy()
datetime.date(1912, 5, 30)
>>> e.lower_fuzzy()[:3] # padding is 100% of a month
(1912, 3, 1)
>>> e.upper_fuzzy()[:3]
(1912, 5, 30)

>>> e = parse_edtf('1912~')
>>> e.lower_fuzzy() # padding is 100% of a year
datetime.date(1911, 1, 1)
>>> e.upper_fuzzy()
datetime.date(1913, 12, 31)
>>> e.lower_fuzzy()[:3] # padding is 100% of a year
(1911, 1, 1)
>>> e.upper_fuzzy()[:3]
(1913, 12, 31)

One can interpret uncertain or approximate dates as 'plus or minus a
[level of precision]'.

If a date is both uncertain __and__ approximate, the padding is applied twice,
i.e. it gets 100% * 2 padding, or 'plus or minus two [levels of precision]'.

Long years
----------

Since EDTF covers a much greater timespan than Python ``date`` objects, it is
easy to exceed the bounds of valid Python ``date``s. In this case, the returned
dates are clamped to ``date.MIN`` and ``date.MAX``.

Future revisions will include numerical interpretations of dates for better
sortability.

Seasons
-------

Expand All @@ -381,9 +408,17 @@ the ``natural_text_field`` parameter of your ``EDTFField``.

When your model is saved, the ``natural_text_field`` value will be parsed to set
the ``date_edtf`` value, and the underlying EDTF object will set the
``_earliest`` and ``_latest`` fields on the model.
``_earliest`` and ``_latest`` fields on the model to a float value representing
the Julian Date.

::

**WARNING**: The conversion to and from Julian Date numerical values can be
inaccurate, especially for ancient dates back to thousands of years BC. Ideally
Julian Date values should be used for range and ordering operations only where
complete accuracy is not required. They should **not** be used for definitive
storage or for display after roundtrip conversions.

Example usage::

from django.db import models
from edtf.fields import EDTFField
Expand All @@ -405,11 +440,11 @@ the ``date_edtf`` value, and the underlying EDTF object will set the
null=True,
)
# use for filtering
date_earliest = models.DateField(blank=True, null=True)
date_latest = models.DateField(blank=True, null=True)
date_earliest = models.FloatField(blank=True, null=True)
date_latest = models.FloatField(blank=True, null=True)
# use for sorting
date_sort_ascending = models.DateField(blank=True, null=True)
date_sort_descending = models.DateField(blank=True, null=True)
date_sort_ascending = models.FloatField(blank=True, null=True)
date_sort_descending = models.FloatField(blank=True, null=True)


Since the ``EDTFField`` and the ``_earliest`` and ``_latest`` field values are
Expand Down
32 changes: 32 additions & 0 deletions changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,38 @@
Changelog
=========

In development
--------------


4.0 (2018-05-31)
----------------

* Remove 1 AD - 9999 AD restriction on date ranges imposed by Python's
``datetime`` module (#26).

**WARNING**: This involves a breaking API change where the following methods
return a ``time.struct_time`` object instead of ``datetime.date`` or
``datetime.datetime`` objects::

lower_strict()
upper_strict()
lower_fuzzy()
upper_fuzzy()

* Add `jdutil` library code by Matt Davis at
`https://gist.github.com/jiffyclub/1294443`_ to convert dates to numerical
float representations.

* Update `EDTFField` to store derived upper/lower strict/fuzzy date values as
numerical values to Django's `FloatField` fields, when available, to permit
storage of arbitrary date/time values.

The older approach where `DateField` fields are used instead is still
supported but not recommended, since this usage will break for date/time
values outside the range 1 AD to 9999 AD.


3.0 (2018-02-13)
----------------

Expand Down
3 changes: 3 additions & 0 deletions edtf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
from edtf.parser.grammar import parse_edtf
from edtf.natlang import text_to_edtf
from edtf.parser.parser_classes import *
from edtf.convert import dt_to_struct_time, struct_time_to_date, \
struct_time_to_datetime, trim_struct_time, struct_time_to_jd, \
jd_to_struct_time
145 changes: 145 additions & 0 deletions edtf/convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
from time import struct_time
from datetime import date, datetime

from edtf import jdutil


TIME_EMPTY_TIME = [0, 0, 0] # tm_hour, tm_min, tm_sec
TIME_EMPTY_EXTRAS = [0, 0, -1] # tm_wday, tm_yday, tm_isdst


def dt_to_struct_time(dt):
"""
Convert a `datetime.date` or `datetime.datetime` to a `struct_time`
representation *with zero values* for data fields that we cannot always
rely on for ancient or far-future dates: tm_wday, tm_yday, tm_isdst
NOTE: If it wasn't for the requirement that the extra fields are unset
we could use the `timetuple()` method instead of this function.
"""
if isinstance(dt, datetime):
return struct_time(
[dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second] +
TIME_EMPTY_EXTRAS
)
elif isinstance(dt, date):
return struct_time(
[dt.year, dt.month, dt.day] + TIME_EMPTY_TIME + TIME_EMPTY_EXTRAS
)
else:
raise NotImplementedError(
"Cannot convert %s to `struct_time`" % type(dt))


def struct_time_to_date(st):
"""
Return a `datetime.date` representing the provided `struct_time.
WARNING: This will fail for dates with years before 1 AD or after 9999 AD.
"""
return date(*st[:3])


def struct_time_to_datetime(st):
"""
Return a `datetime.datetime` representing the provided `struct_time.
WARNING: This will fail for dates with years before 1 AD or after 9999 AD.
"""
return datetime(*st[:6])


def trim_struct_time(st, strip_time=False):
"""
Return a `struct_time` based on the one provided but with the extra fields
`tm_wday`, `tm_yday`, and `tm_isdst` reset to default values.
If `strip_time` is set to true the time value are also set to zero:
`tm_hour`, `tm_min`, and `tm_sec`.
"""
if strip_time:
return struct_time(list(st[:3]) + TIME_EMPTY_TIME + TIME_EMPTY_EXTRAS)
else:
return struct_time(list(st[:6]) + TIME_EMPTY_EXTRAS)


def struct_time_to_jd(st):
"""
Return a float number representing the Julian Date for the given
`struct_time`.
NOTE: extra fields `tm_wday`, `tm_yday`, and `tm_isdst` are ignored.
"""
year, month, day = st[:3]
hours, minutes, seconds = st[3:6]

# Convert time of day to fraction of day
day += jdutil.hmsm_to_days(hours, minutes, seconds)

return jdutil.date_to_jd(year, month, day)


def jd_to_struct_time(jd):
"""
Return a `struct_time` converted from a Julian Date float number.
WARNING: Conversion to then from Julian Date value to `struct_time` can be
inaccurate and lose or gain time, especially for BC (negative) years.
NOTE: extra fields `tm_wday`, `tm_yday`, and `tm_isdst` are set to default
values, not real ones.
"""
year, month, day = jdutil.jd_to_date(jd)

# Convert time of day from fraction of day
day_fraction = day - int(day)
hour, minute, second, ms = jdutil.days_to_hmsm(day_fraction)
day = int(day)

# This conversion can return negative values for items we do not want to be
# negative: month, day, hour, minute, second.
year, month, day, hour, minute, second = _roll_negative_time_fields(
year, month, day, hour, minute, second)

return struct_time(
[year, month, day, hour, minute, second] + TIME_EMPTY_EXTRAS
)


def _roll_negative_time_fields(year, month, day, hour, minute, second):
"""
Fix date/time fields which have nonsense negative values for any field
except for year by rolling the overall date/time value backwards, treating
negative values as relative offsets of the next higher unit.
For example minute=5, second=-63 becomes minute=3, second=57 (5 minutes
less 63 seconds)
This is very unsophisticated handling of negative values which we would
ideally do with `dateutil.relativedelta` but cannot because that class does
not support arbitrary dates, especially not negative years which is the
only case where these nonsense values are likely to occur anyway.
NOTE: To greatly simplify the logic we assume all months are 30 days long.
"""
if second < 0:
minute += int(second / 60.0) # Adjust by whole minute in secs
minute -= 1 # Subtract 1 for negative second
second %= 60 # Convert negative second to positive remainder
if minute < 0:
hour += int(minute / 60.0) # Adjust by whole hour in minutes
hour -= 1 # Subtract 1 for negative minutes
minute %= 60 # Convert negative minute to positive remainder
if hour < 0:
day += int(hour / 24.0) # Adjust by whole day in hours
day -= 1 # Subtract 1 for negative minutes
hour %= 24 # Convert negative hour to positive remainder
if day < 0:
month += int(day / 30.0) # Adjust by whole month in days (assume 30)
month -= 1 # Subtract 1 for negative minutes
day %= 30 # Convert negative day to positive remainder
if month < 0:
year += int(month / 12.0) # Adjust by whole year in months
year -= 1 # Subtract 1 for negative minutes
month %= 12 # Convert negative month to positive remainder
return (year, month, day, hour, minute, second)
18 changes: 17 additions & 1 deletion edtf/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
import pickle

from django.db import models
from django.core.exceptions import FieldDoesNotExist

from edtf import parse_edtf, EDTFObject
from edtf.natlang import text_to_edtf
from edtf.convert import struct_time_to_date, struct_time_to_jd

DATE_ATTRS = (
'lower_strict',
Expand Down Expand Up @@ -116,7 +118,21 @@ def pre_save(self, instance, add):
g = getattr(self, field_attr, None)
if g:
if edtf:
setattr(instance, g, getattr(edtf, attr)())
try:
target_field = instance._meta.get_field(g)
except FieldDoesNotExist:
continue
value = getattr(edtf, attr)() # struct_time
if isinstance(target_field, models.FloatField):
value = struct_time_to_jd(value)
elif isinstance(target_field, models.DateField):
value = struct_time_to_date(value)
else:
raise NotImplementedError(
u"EDTFField does not support %s as a derived data"
u" field, only FloatField or DateField"
% type(target_field))
setattr(instance, g, value)
else:
setattr(instance, g, None)
return edtf
Loading

0 comments on commit bd24338

Please sign in to comment.