Skip to content

Conversation

@jovany-wang
Copy link
Contributor

What do these changes do?

  1. generate java files from flatbuffer automatically.
  2. add auto_gen_tool.py script to change files automatically.

After this PR, We should not change them manually.

Copy link
Contributor

@guoyuhong guoyuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left some comments.

VERBATIM)

# Generate Java bindings for the flatbuffers objects.
set(RAY_HOME ${CMAKE_CURRENT_LIST_DIR}/../../..)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add RAY_HOME, you can also apply it to set(PYTHON_OUTPUT_DIR ${CMAKE_CURRENT_LIST_DIR}/../../../python/ray/core/generated/).

auto_gen_tool.py Outdated
@@ -0,0 +1,89 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not necessary?

auto_gen_tool.py Outdated
return -1


template = '''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template looks too general here. Shall we use template_for_byte_buffer_getter?

auto_gen_tool.py Outdated
file_names = os.listdir(generated_root_path)
for file_name in file_names:
if not file_name.endswith('.java'):
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be continue?

auto_gen_tool.py Outdated
def add_new_line(file, line_num, text):
with open(file, "r") as file_handler:
lines = file_handler.readlines()
if (line_num < 0) or (line_num > len(lines) + 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<= 0

auto_gen_tool.py Outdated

for index in range(len(lines) - 1, -1, -1):
if lines[index] == '}\n':
index_to_be_inserted = index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't index_to_be_inserted just be len(lines) - 1?

Copy link
Contributor Author

@jovany-wang jovany-wang Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can not make sure that there is no blank line in the end of file. Maybe it have 2 blank lines.

auto_gen_tool.py Outdated
index_to_be_inserted = index

if index_to_be_inserted >= 0:
offset = get_offset(file, field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should raise an exception if get_offset fails to find the line number

auto_gen_tool.py Outdated
generate_and_insert_method('%s/TaskInfo.java' % root_path,
'returns', 'returnsAsByteBuffer')
generate_and_insert_method('%s/Arg.java' % root_path,
'objectIds', 'objectIdAsByteBuffer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define these fields as a list. E.g.,

files_and_fields = [
   ('TaskInfo', 'returns'),
   ('Arg', 'objectIds'),
]

add_custom_command(
TARGET gen_gcs_fbs
COMMAND ${FLATBUFFERS_COMPILER} -j -o ${JAVA_OUTPUT_DIR} ${GCS_FBS_SRC}
COMMAND python ${RAY_HOME}/auto_gen_tool.py ${RAY_HOME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the script under java/ dir, and name it modify_generated_java_flatbuffers_files.py

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10671/
Test PASSed.

@jovany-wang
Copy link
Contributor Author

jovany-wang commented Jan 8, 2019

Thanks for your comments, @guoyuhong @raulchen
I have addressed them. :)

@suquark
Copy link
Member

suquark commented Jan 8, 2019

I have added this PR to the project board since it removes magic flatbuffer files. We could probably mention it in the java docs for later developers. In the future we could ask the flatbuffer team to solve this problem.

@suquark
Copy link
Member

suquark commented Jan 8, 2019

And I am agree with @raulchen that we should rename auto_gen_tool.py to modify_generated_java_flatbuffers_files and put it under the java dir.

@jovany-wang
Copy link
Contributor Author

And I am agree with @raulchen that we should rename auto_gen_tool.py to modify_generated_java_flatbuffers_files and put it under the java dir.

Thanks for your mentioning again. I have addressed it. :)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10674/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10675/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10678/
Test PASSed.

@robertnishihara
Copy link
Collaborator

@jovany-wang @raulchen given the discussion in google/flatbuffers#5092, isn't the simplest solution to define a separate table in order to implement a 2D array of bytes? E.g.,

table Bar {
   data: [ubyte];
}
table MyTable {
   bar: [Bar];
}

as @raulchen suggested. That would be fairly unintrusive, right? Then we don't need to modify the autogenerated files at all. That seems like a more sustainable approach.

Also, which flatbuffer schema is the one that actually requires the 2D array of bytes?

@jovany-wang
Copy link
Contributor Author

Also, which flatbuffer schema is the one that actually requires the 2D array of bytes?

Re:
L79 and L45 in gcs.fbs require 2D array.
If we don't want to modify the auto-generated files, we have 2 approach:

  1. as @raulchen suggested, we could define another table:
table ReturnObject {
data: [ubyte];
}

table TaskInfo {
returns: [ReturnObject];
}

But it's too odd.
2) we could store the hex string of objects in flatbuffer.

Btw, there is also another issue if we don't modify the auto-generated files:
The package declaration in Java is different from python and C++, and there is no approach to specify package(namespace) for Java like this:

flatc -j -o dir gcs.fbs --namepsace org.a.b.c

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is much better than than the current state of affairs, so I think we should do this, but longer term we really shouldn't modify the autogenerated files even if it makes the schema a little bit uglier.

E.g., another option is to have a single string which is just all of the object IDs concatenated together.

A heavier weight option is to switch to protobuf.

@@ -0,0 +1,99 @@
import os

This comment was marked as resolved.

def add_package_declarations(generated_root_path):
file_names = os.listdir(generated_root_path)
for file_name in file_names:
if not file_name.endswith('.java'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use " everywhere instead of '

return -1


template_for_byte_buffer_getter = '''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

''' -> """



def modify_generated_java_flatbuffers_files(ray_home, tuples):
root_path = '%s/java/runtime/src/main/java/org/ray/runtime/generated' % ray_home
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use os.path.join instead

add_package_declarations(root_path)

for tuple in tuples:
file_name = '%s/%s.java' % (root_path, tuple[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join

('Arg', 'objectIds', 'objectIdsAsByteBuffer'),
]

modify_generated_java_flatbuffers_files(sys.argv[1], tuples)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document usage at the top of the file, what is sys.argv[1]? Explain the purpose of this script?

add_custom_command(
TARGET gen_gcs_fbs
COMMAND ${FLATBUFFERS_COMPILER} -j -o ${JAVA_OUTPUT_DIR} ${GCS_FBS_SRC}
COMMAND python ${RAY_HOME}/java/modify_generated_java_flatbuffers_files.py ${RAY_HOME}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command python may not exist on the user's machine, e.g., they may only have python3 or some command like that.

COMMENT "Running flatc compiler on ${GCS_FBS_SRC} for python"
VERBATIM)

# Generate Java bindings for the flatbuffers objects.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a more in depth explanation of what is going on here. Please explain that it is modifying the autogenerated flatbuffer files and say why and link to google/flatbuffers#5092.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10693/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10701/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10698/
Test PASSed.

tuples = [
("TaskInfo", "returns", "returnsAsByteBuffer"),
("Arg", "objectIds", "objectIdsAsByteBuffer"),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the third element isn't needed.
  2. maybe rename tuples to class_and_field_pairs

if not file_name.endswith(".java"):
continue
full_name = generated_root_path + "/" + file_name
success = add_new_line(full_name, 2, "package org.ray.runtime.generated;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define the package declaration string as a constant.

file_name = os.path.join(root_path, "%s.java" % tuple[0])
generate_and_insert_method(file_name,
tuple[1],
tuple[2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be well formatted

# See https://github.com/google/flatbuffers/issues/5092 for more details.
#
# 2) The package declaration in Java is different from python and C++, and there is
# no approach to specify package(namepsace) for Java specially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to put this comment in the modify_generated_java_flatbuffers_files.py file, because it explains the purpose of that script.

#
# 1) We have 2D array fields in Table TaskInfo and Arg in `gcs.fbs`, and we
# have no better approach to get the element from 2D array field in flatbuffer.
# See https://github.com/google/flatbuffers/issues/5092 for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In `gcs.fbs`, some fields are defined as `[string]`. And currently flatbuffers doesn't have an API to
get a single element in the list as raw bytes. See https://github.com/google/flatbuffers/issues/5092
for more details.

# See https://github.com/google/flatbuffers/issues/5092 for more details.
#
# 2) The package declaration in Java is different from python and C++, and there is
# no approach to specify package(namepsace) for Java specially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no approach -> no option in the flatc command

return -1


template_for_byte_buffer_getter = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper-case the var name

"""


def generate_and_insert_method(file, field, method_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genrate_byte_buffer_getter

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10708/
Test PASSed.

@robertnishihara
Copy link
Collaborator

@jovany-wang @raulchen what do you think about replacing new_actor_handles: [string]; with new_actor_handles: string;, where the string is simply the concatenation of all of the relevant strings? Then we wouldn't need to modify the autogenerated files.

This doesn't address point 2, but it addresses point 1 and we can probably find a way to address point 2.

@raulchen
Copy link
Contributor

@jovany-wang @raulchen what do you think about replacing new_actor_handles: [string]; with new_actor_handles: string;, where the string is simply the concatenation of all of the relevant strings? Then we wouldn't need to modify the autogenerated files.

This doesn't address point 2, but it addresses point 1 and we can probably find a way to address point 2.

I think that' fine as well. (probably it's better to use [ubytes], if that doesn't require too many code changes.)

For point 2, I think we still need a patch for now. But this patch is more stable, because it only inserts a line at the top, and doesn't depend on the content of the generated files.

@jovany-wang
Copy link
Contributor Author

@robertnishihara @raulchen
Thanks for your suggestions, I will work for this issue.

@jovany-wang jovany-wang changed the title Support to auto-generate Java files from flatbuffer [WIP] Support to auto-generate Java files from flatbuffer Jan 10, 2019
@pcmoritz
Copy link
Contributor

I'm in favor of either defining

table ObjectID {
id: [ubyte];
}

(in the C++ codebase I think it could be done with relatively little changes by modifying from_flatbuf etc) or concatenating (concatenating would be ok since the problem only arises in few locations).

Having to postprocess flatbuffer files is awkward and the problem will probably come up again once we support more languages.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10749/
Test PASSed.

@jovany-wang
Copy link
Contributor Author

Since this PR is outdated too much, I will request another one later and this could be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants