Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when reading TIME logical types #8028

masonh22 · 2025-07-30T14:57:47Z

Which issue does this PR close?

Closes Better support for reading and writing UTC adjusted time types in parquet #8027.

Rationale for this change

This lets users detect when a Time32 or Time64 has the adjusted_to_u_t_c property set in the parquet file. Previously this only worked if the file was written by arrow-rs and the reader chooses not to skip the arrow metadata.

What changes are included in this PR?

This adds documented constant for "adjusted_to_utc" so that users are aware of this functionality. This key gets added when making a Field from a parquet type if the logical type as the adjusted_to_u_t_c field set.

Are these changes tested?

I updated test_time_utc_roundtrip() to skip the arrow metadata to simulate reading a file that doesn't have the metadata we expect or if a user chooses to skip the arrow metadata when reading.

Are there any user-facing changes?

Yes: this adds a new constant "ADJUSTED_TO_UTC_KEY". It may be helpful to add documentation for this near where DataType::Time32/DataType::Time64 are defined.

…ading

masonh22 · 2025-07-30T15:02:59Z

parquet/src/arrow/schema/complex.rs

+                ret.metadata_mut()
+                    .insert(ADJUSTED_TO_UTC_KEY.to_string(), String::new());


Note that we're not adding this if an arrow hint was found. We probably want to add it in both cases, right?

masonh22 · 2025-07-30T15:03:51Z

parquet/src/arrow/mod.rs

+/// [`Field::metadata`]: arrow_schema::Field::metadata
+/// [`DataType::Time32`]: arrow_schema::DataType::Time32
+/// [`DataType::Time64`]: arrow_schema::DataType::Time64


Did I do these links right? How can I verify this? I've never done this before...

You can run cargo doc to test them locally -- the CI checks will do so as well, so if the CI is green, this will be good

alamb

Thanks @masonh22

The code makes sense to me, but I am a little unclear about your usecase. If you need to find out which columns from a parquet file had the adjusted utc flag, perhaps you could just look at the paquet schema for exmaple

https://docs.rs/parquet/latest/parquet/schema/types/struct.SchemaDescriptor.html

And then trace down to the logical type and https://docs.rs/parquet/latest/parquet/basic/enum.LogicalType.html#variant.Time.field.is_adjusted_to_u_t_c 🤔

alamb · 2025-08-01T17:49:31Z

parquet/src/arrow/mod.rs

+/// [`Field::metadata`]: arrow_schema::Field::metadata
+/// [`DataType::Time32`]: arrow_schema::DataType::Time32
+/// [`DataType::Time64`]: arrow_schema::DataType::Time64


You can run cargo doc to test them locally -- the CI checks will do so as well, so if the CI is green, this will be good

masonh22 · 2025-08-04T11:47:29Z

@alamb good point! I will try that out.

masonh22 · 2025-08-05T21:57:22Z

@alamb I tried this out and it is definitely possible, but it adds some complexity that I'd want to avoid. Specifically, when iterating over both arrow Fields and parquet TypePtrs, you need to deal with the fact that list types can be represented multiple ways in parquet (see this section on the backward-compatibility rules for lists if you're not familiar). This would be easier if something like parquet::schema::complex::Visitor was public and more generic.

Still, I think the simplest solution for what I'm trying to do is putting the "adjusted_to_utc" key in the arrow field metadata. This goes along with how arrow-rs handles parquet field IDs, which is another piece of parquet-specific metadata that could be handled by the methods you describe.

alamb · 2025-08-06T14:56:25Z

Still, I think the simplest solution for what I'm trying to do is putting the "adjusted_to_utc" key in the arrow field metadata. This goes along with how arrow-rs handles parquet field IDs, which is another piece of parquet-specific metadata that could be handled by the methods you describe.

I think I am concerned that we will be adding additional overhead to all readers of schemas even when the feature is not widely used

that being said, one more metadata key is not likely a huge deal 🤔

masonh22 · 2025-08-06T15:32:11Z

I think I am concerned that we will be adding additional overhead to all readers of schemas even when the feature is not widely used

I agree, I wish there was a better way to do this. I'm guessing that adding a "zoned" boolean field to Time32 and Time64 is out of the question?

Anyways, aside from the check for whether a field is a zoned Time type, this only adds overhead if the parquet file is using zoned timestamps. For "pure" arrow-rs users who aren't reading files from other writers, this will never come up.

I'm fine with sitting on this for a bit if you or others want to take more time to consider this. This isn't a huge priority for me.

Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when re…

9a3b054

…ading

github-actions bot added the parquet Changes to the parquet crate label Jul 30, 2025

masonh22 commented Jul 30, 2025

View reviewed changes

alamb reviewed Aug 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when reading TIME logical types #8028

Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when reading TIME logical types #8028

Uh oh!

masonh22 commented Jul 30, 2025

Uh oh!

masonh22 Jul 30, 2025

Uh oh!

masonh22 Jul 30, 2025

Uh oh!

alamb Aug 1, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Aug 1, 2025

Uh oh!

masonh22 commented Aug 4, 2025

Uh oh!

masonh22 commented Aug 5, 2025

Uh oh!

alamb commented Aug 6, 2025

Uh oh!

masonh22 commented Aug 6, 2025

Uh oh!

Uh oh!

		ret.metadata_mut()
		.insert(ADJUSTED_TO_UTC_KEY.to_string(), String::new());

Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when reading TIME logical types #8028

Are you sure you want to change the base?

Add ADJUSTED_TO_UTC_KEY constant and add it to field metadata when reading TIME logical types #8028

Uh oh!

Conversation

masonh22 commented Jul 30, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

masonh22 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

masonh22 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

masonh22 commented Aug 4, 2025

Uh oh!

masonh22 commented Aug 5, 2025

Uh oh!

alamb commented Aug 6, 2025

Uh oh!

masonh22 commented Aug 6, 2025

Uh oh!

Uh oh!