Skip to content

Generate ProtocolToThrift files through make and commit #25162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

vhsu14
Copy link
Contributor

@vhsu14 vhsu14 commented May 20, 2025

Description

In #25079, ProtocolToThrift files were manually written. This PR generates these files using pythons scripts and chevron templates.

Motivation and Context

Impact

Test Plan

Unit test + Verifier (test_id 225961)

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
General Changes
* Update ProtocolToThrift files to be generated for cpp thrift serde.

@vhsu14 vhsu14 requested a review from a team as a code owner May 20, 2025 21:53
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from d505a2f to dd4a16c Compare May 21, 2025 00:12
@tdcmeehan
Copy link
Contributor

If we're using Thrift, why can't we just use either the Drift IDL generator, or have both Java and C++ auto-generate the Thrift structures from a checked in IDL?

@vhsu14
Copy link
Contributor Author

vhsu14 commented May 21, 2025

If we're using Thrift, why can't we just use either the Drift IDL generator, or have both Java and C++ auto-generate the Thrift structures from a checked in IDL?

We are using Drift to annotate classes in Java and generate an IDL for C++ to auto-gen the Thrift structures. Using a checked-in IDL has some cons in Java:

  1. While we are in the process of switching, we need to maintain two sets of classes and write toThrift/ fromThrift to convert the objects. This requires huge amounts of effort, can be prone to mistakes, and causes GC.
  2. Thrift structures don't support having class methods, so for every class that has methods, we will need to have a utils class for it.
  3. Everyone needs to be aware of the checked-in IDL. Whenever we want to modify classes in the Thrift flow, we need to make corresponding changes to the IDL.

Thanks for the comment. Let me know if you have further questions.

@tdcmeehan
Copy link
Contributor

Can we discuss the approach for code generation and overall timeline in this RFC? The protocol infrastructure has its own overhead and technical debt, and it's not really clear to me how this is going to work.

For example, some questions I have:

  • Does the existing readme in the Thrift directory apply to the current changes?
  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?

We use RFCs to get alignment on design and also to document decisions made. These are all important decisions that needs to be memorialized into the above RFC.

@shangm2
Copy link
Contributor

shangm2 commented May 21, 2025

Can we discuss the approach for code generation and overall timeline in this RFC? The protocol infrastructure has its own overhead and technical debt, and it's not really clear to me how this is going to work.

For example, some questions I have:

  • Does the existing readme in the Thrift directory apply to the current changes?
  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?

We use RFCs to get alignment on design and also to document decisions made. These are all important decisions that needs to be memorialized into the above RFC.

Hey @tdcmeehan . Let me know if the following answers your questions. And yes, we will incorporate these questions and answers into the RFC as well.

  • Does the existing readme in the Thrift directory apply to the current changes?
    Answer: if you are referring to the pipeline of converting thrift to json with the help of python script and yaml file, then yes, we still need this pipeline to generate some code, e.g. those in presto-native-execution/presto_cpp/main/thrift/ProtocolToThrift.cpp so that we dont need to write them manually.

  • What is the timeline for maintaining both JSON and Thrift, or do we plan to maintain both?
    Answer: I don't think we have a timeline to deprecate json at least for now. But we can discuss if/how we can do this once the thrift migration is in a stable state.

  • Is the current Thrift file manually generated? Is this transitional, or going forward would it be a part of the developer workflow to regenerate the Thrift IDL using the Drift tool?
    Answer: the idl file is auto generated using drift library. It is achieved when the presto-thrift-spec gets built which happens everything when you build presto. Here are two related prs:

# See the License for the specific language governing permissions and
# limitations under the License.

StructMap:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructMap maps fields that have different names for protocol and thrift classes

fields:
infoUnion: { field_name: info }

SkipStruct:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't generate toThrift/fromThrift classes for the following structs because they are related to OperatorInfo, which is an empty struct in C++

StageExecutionId:
StageId:

WrapperStruct:
Copy link
Contributor Author

@vhsu14 vhsu14 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are single field structs in IDL that are directly declared in C++. For example,

In IDL:
struct ConnectorId {
1: string catalogName;
}

In C++:
using ConnectorId = std::string;

RuntimeStats:
SqlFunctionId:

Special:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are structs that require custom conversions, which are defined under the /special folder

# See the License for the specific language governing permissions and
# limitations under the License.

all: ProtocolToThrift.h ProtocolToThrift.cpp
Copy link
Contributor Author

@vhsu14 vhsu14 May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Makefile defines the flow for generating ProtocolToThrift files as defined in README.md. For now, it depends on presto_thrift.thrift, which is a copy of Drift generated IDL. @shangm2 is working on having it auto-generated as part of presto-thrift-spec and we will reference to the generated IDL once that is done.

@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch 2 times, most recently from 7ec0dac to fd74671 Compare May 21, 2025 20:58
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shangm2
shangm2 previously approved these changes May 22, 2025
Copy link
Contributor

@shangm2 shangm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

./presto_protocol-to-thrift-json.py presto_thrift.json ../../presto_protocol/core/presto_protocol_core.json | jq . > presto_protocol-to-thrift-json.json

presto_thrift.json: presto_thrift.thrift ./thrift2json.py
./thrift2json.py presto_thrift.thrift | jq . > presto_thrift.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .thrift file is at presto-thrift-spec/target/thrift/presto-thrift-protocol.thrift



if __name__ == "__main__":
sys.exit(main())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add new line at end of file.

NikhilCollooru
NikhilCollooru previously approved these changes May 27, 2025
@vhsu14 vhsu14 dismissed stale reviews from NikhilCollooru and shangm2 via 7e1daf7 May 31, 2025 00:14
@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch 2 times, most recently from 7e1daf7 to e68e3b1 Compare June 3, 2025 18:47
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from e68e3b1 to 8e72f74 Compare June 4, 2025 02:29
NikhilCollooru
NikhilCollooru previously approved these changes Jun 4, 2025
gggrace14
gggrace14 previously approved these changes Jun 4, 2025
@vhsu14 vhsu14 dismissed stale reviews from gggrace14 and NikhilCollooru via f72d9a5 June 5, 2025 06:15
@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from 8e72f74 to f72d9a5 Compare June 5, 2025 06:15
@vhsu14 vhsu14 force-pushed the generate-protocol-to-thrift-files branch from f72d9a5 to 405a1c6 Compare June 5, 2025 15:44
@gggrace14 gggrace14 self-requested a review June 5, 2025 20:55
@facebook-github-bot
Copy link
Collaborator

@vhsu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@vhsu14 vhsu14 merged commit 369570e into prestodb:master Jun 5, 2025
104 of 107 checks passed
soumiiow added a commit to soumiiow/presto that referenced this pull request Jun 16, 2025
soumiiow added a commit to soumiiow/presto that referenced this pull request Jun 17, 2025
soumiiow added a commit to soumiiow/presto that referenced this pull request Jun 17, 2025
@prestodb-ci prestodb-ci mentioned this pull request Jul 28, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants