Skip to content

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Aug 28, 2025

Which issue does this PR close?

Note: this targets a feature branch, not main

Rationale for this change

Begins adding custom thrift write support.

What changes are included in this PR?

Adds traits to aid in writing of thrift and modifies thrift macros to generate writing code.

Are these changes tested?

Yes, adds some roundtrip tests to validate encoded data can be decoded to the same state.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 28, 2025
@alamb
Copy link
Contributor

alamb commented Sep 9, 2025

I am hoping to review this later today or tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through this and I think it looks really nice 😍

One thing that might be a useful exercise is to simulate someone making a new change to the thrift parquet spec and seeing how easy/hard it is to add support for that new feature with the custom thrift parser.

THe real test will be to see if someone other than @etseidl can accomplish it :)

}
}

// FIXME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove. This is one of those situations where I have to hand-do the serialization code. I could instead use a private intermediate struct and then translate...it's just an enum after all.

another example is LogicalType. The current implementation uses struct variants, but the thrift macros use tuple variants. I could remove a lot of code if we just change that, but there would be ripples.

}
}

/////////////////////////
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is pretty nice that this is so straightforward

let column_name = col_meta.column_descr().name().to_string();
let page_locations = offset_index[rg_index][col_idx]
.page_locations().to_vec();
let page_locations = offset_index[rg_index][col_idx].page_locations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

/// macro to use when decoding struct fields
#[doc(hidden)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to get some more doc strings on these macros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhorstmann added some to his macros that I will steal and expand upon. Right now I've got six more branches to go that all depend on the preceding one, so I don't want to make too many changes here...I'll instead start a list of all the places I need more documentation and make those changes once the full end-to-end is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me

}
}

pub(crate) trait WriteThrift {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more docstrings here explaining what this was for and what it did would be super helpful I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are many more changes to this particular API so I'll take a stab at documentation here.

@etseidl
Copy link
Contributor Author

etseidl commented Sep 10, 2025

Thanks for the review @alamb! I'll start on the documentation for the write API, and then leave some TODOs based on your comments. Then on to the next PR 😄

@alamb
Copy link
Contributor

alamb commented Sep 10, 2025

It is pretty exciting to see this coming together

@etseidl etseidl merged commit 1404608 into apache:gh5854_thrift_remodel Sep 10, 2025
16 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 11, 2025

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants