Skip to content

Harmonization DateTime incorrect handling of naive time strings #2278

Closed
@gethvi

Description

Considering this function of DateTime class:

@staticmethod
def __parse(value: str) -> Optional[str]:
try:
DateTime.parse_utc_isoformat(value)
except ValueError:
pass
else:
return utils.decode(value)
try:
value = dateutil.parser.parse(value, fuzzy=True)
value = value.astimezone(pytz.utc)
value = value.isoformat()
except (ValueError, OverflowError):
return None
return utils.decode(value)

In particular the line:

value = value.astimezone(pytz.utc) 

This causes that naive time strings (without timezone information) are being considered as being in a timezone of the local computer. My computer is set to a UTC +01:00 timezone and this is the result:

DateTime.sanitize("2021-01-17 07:44:46")    # this calls DateTime.__parse

'2021-01-17T06:44:46+00:00'

Because the parsed timestamps are usually produced by feeds outside of the local computer I believe it makes much more sense to assume they are produced in UTC rather than processing them based on whatever is the local timezone of the IntelMQ installation.


On a side note: There are several cases of this kind of code:

if not value.tzinfo and sys.version_info <= (3, 6):
    value = pytz.utc.localize(value)
elif not value.tzinfo:
    value = value.astimezone(pytz.utc)

Such expressions are not equal in what they do. The first one (pytz.utc.localize) merely attaches a time zone object to a datetime without adjustment of date and time data. The second one (astimezone(pytz.utc)) adjusts time and date based on local time (docs). Example (with the computer in UTC +01:00):

value = datetime.datetime.strptime("2021-01-17 07:44:46", "%Y-%m-%d %H:%M:%S")
value.isoformat()
'2021-01-17T07:44:46'

pytz.utc.localize(value).isoformat()
'2021-01-17T07:44:46+00:00'

value.astimezone(pytz.utc).isoformat()
'2021-01-17T06:44:46+00:00'

I think this is also the reason why the following test works only when the machine running the test is in UTC timezone:

@unittest.skipIf(time.timezone != 0, 'Test only works with UTC')
def test_datetime_convert(self):
self.assertEqual('2019-07-01T15:15:15+00:00',
harmonization.DateTime.convert('15 15 15 07 01 2019',
format='from_format|%M %H %S %m %d %Y'))
self.assertEqual('2019-07-01T00:00:00+00:00',
harmonization.DateTime.convert('07-01-2019',
'from_format_midnight|%m-%d-%Y'))
self.assertEqual('2011-02-01T02:43:11.572760+00:00',
harmonization.DateTime.convert(129410017915727600,
'windows_nt'))

PROPOSED SOLUTION:
I think the better way of handling of naive time strings is to assume they are in UTC rather than local timezone of IntelMQ installation:

if not value.tzinfo:
    value = value.replace(tzinfo=datetime.timezone.utc)
else:
    value = value.astimezone(tz=datetime.timezone.utc)

The function value.replace(tzinfo=datetime.timezone.utc) does effectively the same thing as pytz.utc.localize(value): it attaches a time zone object to a datetime without adjustment of date and time data. And without needing pytz as dependency.

Quick tests showed that my solution seems to work and passes all the tests including the one previously skipped.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions