Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API, Core: implement types timestamp_ns and timestamptz_ns #8971

Closed
wants to merge 9 commits into from

Conversation

jacobmarble
Copy link
Contributor

@jacobmarble jacobmarble commented Nov 1, 2023

Helps #8657

This change is divided into two commits:

  • API
  • Core

This change adds field ChronoUnit unit to TimestampType, such that TimestampType now represents four specified types:

Helps apache#8657

This change adds field `ChronoUnit unit` to `TimestampType`, such that
`TimestampType` now represents four specified types:
- `timestamp` (existing)
- `timestamptz` (existing)
- `timestamp_ns` (new apache#8683)
- `timestamptz_ns` (new apache#8683)
@jacobmarble jacobmarble force-pushed the jgm-timestamp-nanos-core branch from 0d9058a to 5fd82c2 Compare November 1, 2023 21:14
@jacobmarble
Copy link
Contributor Author

I've tracked down the source of the four failing "core-tests" and will push a fix tomorrow.

@github-actions github-actions bot added parquet ORC Specification Issues that may introduce spec changes. labels Nov 3, 2023
@jacobmarble
Copy link
Contributor Author

Fixing four failing tests caused me to do most of the new type work for Parquet and ORC as well. Please note that the ORC type attributes need an update in the spec, else it isn't possible to correctly convert an ORC timestamp to an Iceberg timestamp.

jacobmarble added a commit to jacobmarble/apache-iceberg that referenced this pull request Nov 3, 2023
jacobmarble added a commit to jacobmarble/apache-iceberg that referenced this pull request Nov 6, 2023
jacobmarble added a commit to jacobmarble/apache-iceberg that referenced this pull request Nov 6, 2023
@jacobmarble jacobmarble force-pushed the jgm-timestamp-nanos-core branch 2 times, most recently from 09c1f25 to 1e374c6 Compare November 6, 2023 23:01
@jacobmarble jacobmarble force-pushed the jgm-timestamp-nanos-core branch from 1e374c6 to cb109a4 Compare November 7, 2023 15:45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high level questions. Don't the Avro and Parquet details need to be updated?

Also, are these changes intended to be backdate into spec version 1 and 2 or part of v3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the Avro and Parquet details need to be updated?

Yes.

(I'm new to the project.) My original intent here was to create multiple PRs, each with a different scope (API/Core, Avro, ORC, Parquet, Spark/Flink/Hive/etc). However, the cascade of failing tests in this PR suggests that one big change might be necessary. Do you have any feedback on this?

are these changes intended to be backdate into spec version 1 and 2 or part of v3?

These changes affect v3 only.

iceberg/format/spec.md

Lines 184 to 185 in 7c4bdaa

| [v3](#version-3) | **`timestamp_ns`** | Timestamp, nanosecond precision, without timezone | [2] |
| [v3](#version-3) | **`timestamptz_ns`** | Timestamp, nanosecond precision, with timezone | [2] |

Types `timestamp_ns` and `timestamptz_ns` are added in v3.

@jbonofre
Copy link
Member

jbonofre commented Nov 7, 2023

I'm starting to review. Thanks !

@rdblue
Copy link
Contributor

rdblue commented Nov 7, 2023

@jacobmarble can you break this into smaller commits? There are a ton of files changed here and I'm concerned about catching problems with such a large PR.

@jbonofre
Copy link
Member

jbonofre commented Nov 8, 2023

I agree with @rdblue

For partition stat we had a PR dedicated for spec.
As it's not necessary the same reviewers, I think it's good to have one PR for spec (even if it's very small) and PRs for api and impl.

@jacobmarble
Copy link
Contributor Author

@jacobmarble can you break this into smaller commits? There are a ton of files changed here and I'm concerned about catching problems with such a large PR.

I agree with @rdblue

For partition stat we had a PR dedicated for spec. As it's not necessary the same reviewers, I think it's good to have one PR for spec (even if it's very small) and PRs for api and impl.

Happy to split this up. I tried to scope this to API and Core, but the cascade of failing tests led to this.

@jacobmarble jacobmarble closed this Nov 8, 2023
@jacobmarble jacobmarble deleted the jgm-timestamp-nanos-core branch November 27, 2023 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API core ORC parquet Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants