-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[WIP] Support to auto-generate Java files from flatbuffer #3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
guoyuhong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I left some comments.
src/ray/gcs/CMakeLists.txt
Outdated
| VERBATIM) | ||
|
|
||
| # Generate Java bindings for the flatbuffers objects. | ||
| set(RAY_HOME ${CMAKE_CURRENT_LIST_DIR}/../../..) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add RAY_HOME, you can also apply it to set(PYTHON_OUTPUT_DIR ${CMAKE_CURRENT_LIST_DIR}/../../../python/ray/core/generated/).
auto_gen_tool.py
Outdated
| @@ -0,0 +1,89 @@ | |||
| #!/usr/bin/env python | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not necessary?
auto_gen_tool.py
Outdated
| return -1 | ||
|
|
||
|
|
||
| template = ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template looks too general here. Shall we use template_for_byte_buffer_getter?
auto_gen_tool.py
Outdated
| file_names = os.listdir(generated_root_path) | ||
| for file_name in file_names: | ||
| if not file_name.endswith('.java'): | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be continue?
auto_gen_tool.py
Outdated
| def add_new_line(file, line_num, text): | ||
| with open(file, "r") as file_handler: | ||
| lines = file_handler.readlines() | ||
| if (line_num < 0) or (line_num > len(lines) + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<= 0
auto_gen_tool.py
Outdated
|
|
||
| for index in range(len(lines) - 1, -1, -1): | ||
| if lines[index] == '}\n': | ||
| index_to_be_inserted = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't index_to_be_inserted just be len(lines) - 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can not make sure that there is no blank line in the end of file. Maybe it have 2 blank lines.
auto_gen_tool.py
Outdated
| index_to_be_inserted = index | ||
|
|
||
| if index_to_be_inserted >= 0: | ||
| offset = get_offset(file, field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should raise an exception if get_offset fails to find the line number
auto_gen_tool.py
Outdated
| generate_and_insert_method('%s/TaskInfo.java' % root_path, | ||
| 'returns', 'returnsAsByteBuffer') | ||
| generate_and_insert_method('%s/Arg.java' % root_path, | ||
| 'objectIds', 'objectIdAsByteBuffer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's define these fields as a list. E.g.,
files_and_fields = [
('TaskInfo', 'returns'),
('Arg', 'objectIds'),
]
src/ray/gcs/CMakeLists.txt
Outdated
| add_custom_command( | ||
| TARGET gen_gcs_fbs | ||
| COMMAND ${FLATBUFFERS_COMPILER} -j -o ${JAVA_OUTPUT_DIR} ${GCS_FBS_SRC} | ||
| COMMAND python ${RAY_HOME}/auto_gen_tool.py ${RAY_HOME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the script under java/ dir, and name it modify_generated_java_flatbuffers_files.py
|
Test PASSed. |
|
Thanks for your comments, @guoyuhong @raulchen |
|
I have added this PR to the project board since it removes magic flatbuffer files. We could probably mention it in the java docs for later developers. In the future we could ask the flatbuffer team to solve this problem. |
|
And I am agree with @raulchen that we should rename |
Thanks for your mentioning again. I have addressed it. :) |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
@jovany-wang @raulchen given the discussion in google/flatbuffers#5092, isn't the simplest solution to define a separate table in order to implement a 2D array of bytes? E.g., as @raulchen suggested. That would be fairly unintrusive, right? Then we don't need to modify the autogenerated files at all. That seems like a more sustainable approach. Also, which flatbuffer schema is the one that actually requires the 2D array of bytes? |
Re:
But it's too odd. Btw, there is also another issue if we don't modify the auto-generated files: |
robertnishihara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is much better than than the current state of affairs, so I think we should do this, but longer term we really shouldn't modify the autogenerated files even if it makes the schema a little bit uglier.
E.g., another option is to have a single string which is just all of the object IDs concatenated together.
A heavier weight option is to switch to protobuf.
| @@ -0,0 +1,99 @@ | |||
| import os | |||
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| def add_package_declarations(generated_root_path): | ||
| file_names = os.listdir(generated_root_path) | ||
| for file_name in file_names: | ||
| if not file_name.endswith('.java'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use " everywhere instead of '
| return -1 | ||
|
|
||
|
|
||
| template_for_byte_buffer_getter = ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
''' -> """
|
|
||
|
|
||
| def modify_generated_java_flatbuffers_files(ray_home, tuples): | ||
| root_path = '%s/java/runtime/src/main/java/org/ray/runtime/generated' % ray_home |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use os.path.join instead
| add_package_declarations(root_path) | ||
|
|
||
| for tuple in tuples: | ||
| file_name = '%s/%s.java' % (root_path, tuple[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join
| ('Arg', 'objectIds', 'objectIdsAsByteBuffer'), | ||
| ] | ||
|
|
||
| modify_generated_java_flatbuffers_files(sys.argv[1], tuples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document usage at the top of the file, what is sys.argv[1]? Explain the purpose of this script?
src/ray/gcs/CMakeLists.txt
Outdated
| add_custom_command( | ||
| TARGET gen_gcs_fbs | ||
| COMMAND ${FLATBUFFERS_COMPILER} -j -o ${JAVA_OUTPUT_DIR} ${GCS_FBS_SRC} | ||
| COMMAND python ${RAY_HOME}/java/modify_generated_java_flatbuffers_files.py ${RAY_HOME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command python may not exist on the user's machine, e.g., they may only have python3 or some command like that.
| COMMENT "Running flatc compiler on ${GCS_FBS_SRC} for python" | ||
| VERBATIM) | ||
|
|
||
| # Generate Java bindings for the flatbuffers objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves a more in depth explanation of what is going on here. Please explain that it is modifying the autogenerated flatbuffer files and say why and link to google/flatbuffers#5092.
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
| tuples = [ | ||
| ("TaskInfo", "returns", "returnsAsByteBuffer"), | ||
| ("Arg", "objectIds", "objectIdsAsByteBuffer"), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the third element isn't needed.
- maybe rename
tuplestoclass_and_field_pairs
| if not file_name.endswith(".java"): | ||
| continue | ||
| full_name = generated_root_path + "/" + file_name | ||
| success = add_new_line(full_name, 2, "package org.ray.runtime.generated;") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define the package declaration string as a constant.
| file_name = os.path.join(root_path, "%s.java" % tuple[0]) | ||
| generate_and_insert_method(file_name, | ||
| tuple[1], | ||
| tuple[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be well formatted
src/ray/gcs/CMakeLists.txt
Outdated
| # See https://github.com/google/flatbuffers/issues/5092 for more details. | ||
| # | ||
| # 2) The package declaration in Java is different from python and C++, and there is | ||
| # no approach to specify package(namepsace) for Java specially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's better to put this comment in the modify_generated_java_flatbuffers_files.py file, because it explains the purpose of that script.
src/ray/gcs/CMakeLists.txt
Outdated
| # | ||
| # 1) We have 2D array fields in Table TaskInfo and Arg in `gcs.fbs`, and we | ||
| # have no better approach to get the element from 2D array field in flatbuffer. | ||
| # See https://github.com/google/flatbuffers/issues/5092 for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In `gcs.fbs`, some fields are defined as `[string]`. And currently flatbuffers doesn't have an API to
get a single element in the list as raw bytes. See https://github.com/google/flatbuffers/issues/5092
for more details.
src/ray/gcs/CMakeLists.txt
Outdated
| # See https://github.com/google/flatbuffers/issues/5092 for more details. | ||
| # | ||
| # 2) The package declaration in Java is different from python and C++, and there is | ||
| # no approach to specify package(namepsace) for Java specially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no approach -> no option in the flatc command
| return -1 | ||
|
|
||
|
|
||
| template_for_byte_buffer_getter = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper-case the var name
| """ | ||
|
|
||
|
|
||
| def generate_and_insert_method(file, field, method_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genrate_byte_buffer_getter
|
Test PASSed. |
|
@jovany-wang @raulchen what do you think about replacing This doesn't address point 2, but it addresses point 1 and we can probably find a way to address point 2. |
I think that' fine as well. (probably it's better to use For point 2, I think we still need a patch for now. But this patch is more stable, because it only inserts a line at the top, and doesn't depend on the content of the generated files. |
|
@robertnishihara @raulchen |
|
I'm in favor of either defining (in the C++ codebase I think it could be done with relatively little changes by modifying from_flatbuf etc) or concatenating (concatenating would be ok since the problem only arises in few locations). Having to postprocess flatbuffer files is awkward and the problem will probably come up again once we support more languages. |
|
Test PASSed. |
|
Since this PR is outdated too much, I will request another one later and this could be closed now. |
What do these changes do?
auto_gen_tool.pyscript to change files automatically.After this PR, We should not change them manually.