Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(clickhouse sink): Add encoding.timestamp_format field #1634

Merged
merged 2 commits into from
Feb 24, 2020

Conversation

Jeffail
Copy link
Contributor

@Jeffail Jeffail commented Jan 29, 2020

Allows users to specify a format for timestamp such that it can be parsed by Clickhouse as a DateTime data type.

By default we're not applying any formatting (sending as a string), this preserves backwards compatibility, it's also not an entirely unfavorable behavior since it has a higher precision. However, it's worth considering making unix formatting the default behavior here instead.

Closes #1443

@Jeffail Jeffail requested a review from a user January 29, 2020 12:20
@Jeffail
Copy link
Contributor Author

Jeffail commented Jan 29, 2020

Also for now I've implemented this as an enum in case we wish to add more formats later, but I'm also not convinced there's any reason not to use unix timestamp format so we could opt to change it to a boolean flag (called unix_timestamp?)

@Jeffail
Copy link
Contributor Author

Jeffail commented Jan 29, 2020

After having a moment to think about it I think the right course is to have a boolean flag format_unix_timestamp which when true results in the field timestamp being sent as a unix timestamp.

@binarylogic
Copy link
Contributor

I think I agree. A few questions:

  1. Would it not be better to treat all timestamps in the event this way? We have a timestamp type that any field can use.
  2. This fits into our larger strategy of adding encoding options, which we discussed here. So I'm wondering if it makes more sense to add an encoding.timestamp_format option? This way we can add encoding.fields_whitelist and so on.
  3. If unix is the preferred/optimal way to encode timestamp for Clickhouse then we should default to that instead of a string. I don't know enough about Clickhouse though.

@Jeffail
Copy link
Contributor Author

Jeffail commented Jan 29, 2020

If we can tie this into encoding options in my view that's the more "user empowering" way, it also makes the behavior clearer. Also yes on it applying to all timestamps.

Do we want to wait for the encoding work to be fully specced out, or add a field for this sink in the interim? If we add format_timestamp_unix (still cant figure a good name for this) then when encoding.timestamps_as_unix (or whatever) is added we can deprecate the old field.

@ghost
Copy link

ghost commented Jan 30, 2020

I wonder how could we, maybe in subsequent PRs, also make it possible to either

  • create two separate fields (one is DateTime and another one is UInt32 containing the number of nanoseconds after the last second);
  • create a single Tuple, containing again a DateTime and a UInt32 with the number of nanoseconds.

These are two options frequently recommended to be used in ClickHouse for use cases where sub-second precision is needed. Also, some people might want to store, for example, milliseconds instead of nanoseconds.

@Hoverbear
Copy link
Contributor

Is there anything holding up this PR?

@Jeffail
Copy link
Contributor Author

Jeffail commented Feb 23, 2020

We ought to expand the behavior so that all timestamp fields are encoded according to this setting, and not just the standard timestamp field. But other than that I think we can add this in without aggro.

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>
@binarylogic
Copy link
Contributor

I agree with @Jeffail. Can we go with an encoding.timestamp_format option? This will jive better with option 2 here. Do you agree @Hoverbear?

If possible, I'd like to get this in 0.8.0 for one of our important users.

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>
@Jeffail
Copy link
Contributor Author

Jeffail commented Feb 24, 2020

Okay so for now I've:

  1. Nested the field timestamp_format under encoding
  2. Added an enum value rfc3339, which is the current behavior, I think it helps to make it explicit
  3. Kept the format rfc3339 as default. For some users they will want proper timestamps and now they have the option. However, since it loses precision I don't think we should go as far as making it the default. Instead, we can revisit this (Ability to store sub-second precision in ClickHouse #1672) and add more encoding options that preserve precision later.

Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Looks good to me.

@Jeffail Jeffail merged commit 992b493 into master Feb 24, 2020
@Jeffail Jeffail deleted the clickhouse-datetime branch February 24, 2020 17:13
@binarylogic binarylogic changed the title feat(clickhouse sink): Add timestamp_format field feat(clickhouse sink): Add encoding.timestamp_format field Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Insert timestamps as native DateTime type to ClickHouse
3 participants