[Neo] Add JumpStart Integration to SM Neo Neuron AOT compilation flow #1854
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Neo Updates
This PR adds additional JumpStart integration in the Neo Neuron partitioning scripts. When a JumpStart model is passed in, the script will output Neuron subgraphs to be consumed by JumpStart. When JumpStart metadata files
__model_info__.json
and__script_info__.json
files are found & and the environment variableSM_CACHE_JUMPSTART_FORMAT
is set, the Neo partitioning script will output the Neuron Cache subgraphs for the current model under the directoryPRE_COMPILED_NEURON_GRAPH_INFER
in the partitioning output.The subgraphs will also be saved to a secondary location in the Neuron cache: e.g:
this secondary location will be used by Neo service to better organize its Neuron cache.
Changes to DJL-Serving Code
There is one change to shared djl-serving code outside of Neuron scripts. The
PartitionService
is changed to use POpen from subprocess.run() so that standard output can be captured and returned from PartitionService.run_partition(). This output is used to capture the exact Neuron subgraphs associated with the model being compiled so that if there are extraneous subgraphs existing in the Neuron cache directory, only the subgraphs associated with the current model are returned.