Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neo] Add JumpStart Integration to SM Neo Neuron AOT compilation flow #1854

Merged
merged 1 commit into from
May 1, 2024

Conversation

a-ys
Copy link
Contributor

@a-ys a-ys commented Apr 30, 2024

Description

Neo Updates

This PR adds additional JumpStart integration in the Neo Neuron partitioning scripts. When a JumpStart model is passed in, the script will output Neuron subgraphs to be consumed by JumpStart. When JumpStart metadata files __model_info__.json and __script_info__.json files are found & and the environment variable SM_CACHE_JUMPSTART_FORMAT is set, the Neo partitioning script will output the Neuron Cache subgraphs for the current model under the directory PRE_COMPILED_NEURON_GRAPH_INFERin the partitioning output.

The subgraphs will also be saved to a secondary location in the Neuron cache: e.g:

/<cache dir>/JUMPSTART_COMPILED_GRAPHS/neuronxcc-2.13.68.0+6dfecc895/<JumpStart model id>/inference/PRE_COMPILED_NEURON_GRAPH_INFER/neuronxcc-2.13.68.0+6dfecc895/<Module folders>

this secondary location will be used by Neo service to better organize its Neuron cache.

Changes to DJL-Serving Code

There is one change to shared djl-serving code outside of Neuron scripts. The PartitionService is changed to use POpen from subprocess.run() so that standard output can be captured and returned from PartitionService.run_partition(). This output is used to capture the exact Neuron subgraphs associated with the model being compiled so that if there are extraneous subgraphs existing in the Neuron cache directory, only the subgraphs associated with the current model are returned.

@a-ys a-ys requested review from zachgk, frankfliu and a team as code owners April 30, 2024 23:53
@tosterberg tosterberg merged commit 3f35e3a into deepjavalibrary:master May 1, 2024
8 checks passed
tosterberg pushed a commit to tosterberg/djl-serving that referenced this pull request May 1, 2024
tosterberg added a commit that referenced this pull request May 1, 2024
Co-authored-by: Andrew Song <40076917+a-ys@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants