Skip to content

Conversation

@flonkler
Copy link
Contributor

@flonkler flonkler commented Feb 24, 2022

Problem:

I recently used the flights.json.gz dataset for testing Elasticsearch queries in another project. When indexing the data into Elasticsearch I noticed that the timestamp causes some parsing errors. The date format for this field, which is specified in tests/__init__.py, is strict_date_hour_minute_second. In the dataset the timestamp is sometimes set to something like this "2018-02-10", which leads to parsing errors.

I wrote this short bash "script" to search all timestamp fields in the dataset that don't contain the "T" separator. It's not that efficient but it proves the point.

gunzip --stdout flights.json.gz | while read -r line; do
    echo $line | jq '.timestamp' | grep -v "T"
done

The output is:

"2018-01-02"
"2018-01-03"
"2018-01-04"
"2018-01-05"
"2018-01-06"
"2018-01-07"
"2018-01-08"
"2018-01-09"
"2018-01-10"
"2018-01-11"
"2018-01-12"
"2018-01-12"
"2018-01-12"
"2018-01-13"
"2018-01-14"
"2018-01-15"
"2018-01-16"
"2018-01-17"
"2018-01-18"
"2018-01-19"
"2018-01-20"
"2018-01-21"
"2018-01-22"
"2018-01-23"
"2018-01-24"
"2018-01-25"
"2018-01-26"
"2018-01-27"
"2018-01-28"
"2018-01-29"
"2018-01-30"
"2018-01-31"
"2018-02-01"
"2018-02-02"
"2018-02-03"
"2018-02-04"
"2018-02-05"
"2018-02-06"
"2018-02-07"
"2018-02-08"
"2018-02-09"
"2018-02-09"
"2018-02-09"
"2018-02-10"
"2018-02-11"

BTW, these "invalid" timestamps only occur in the flights.json.gz dataset but not in flights_small.json.gz.

Solution:

In order to support these timestamps I changed the date format to strict_date_optional_time.

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

jenkins test this please

@pquentin
Copy link
Member

pquentin commented Nov 6, 2023

buildkite test this please

@pquentin
Copy link
Member

pquentin commented Nov 6, 2023

buildkite test this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants