Infer lmi engine #623

siddvenk · 2023-04-13T00:54:23Z

Description

This PR adds support for inferring the specific Python engine to use. This logic should only be called when users don't specify engine in serving.properties, and we can reasonably assume the model is intended to be run with a Python backend.

Use Cases covered:

User only specifies HF_MODEL_ID environment variable (no serving properties, model artifacts, or user code)
User provides serving properties with model id (either hf hub id or s3 url), but does not specify engine
model_id is not provided in any form, but model artifacts are present in model_dir

This does not support the use case where users provide their own code that is expected to be invoked via the hugging face inference toolkit. There's no special error handling for that now, it will just fail when the PyProcess tries to load the handler and invoke it.

The logic for inferring the backend is largely copied over from the logic in the PySDK.

…e in model dir

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

lanking520 · 2023-04-13T01:02:01Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+            return "DeepSpeed";
+        }
+
+        if (!isTensorParallelSupported(numAttentionHeads, tensorParallelDegree)) {


Maybe DS or FT would have some mechanism from their end to decide how to do model sharding. I would suggest to not check this

At least for DS, they will throw an exception if this check fails. But really the only practical examples of this we have seen is gpt2-xl.

In the future it's possible that DS and FT change that behavior and can actually accommodate such a model. At that point this method would become incorrect.

I can remove this, since it's going to be validated by the engine anyways. But the benefit of doing it this way is that we don't recommend say gpt2-xl to run with DeepSpeed with TP when we know it won't work.

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

frankfliu · 2023-04-13T03:27:21Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+                || Files.isRegularFile(modelDir.resolve(prefix + ".py"))
+                || Utils.getEnvOrSystemProperty("HF_MODEL_ID") != null
+                || Files.isRegularFile(modelDir.resolve("config.json"))
+                || prop.containsKey("option.s3url")


other engine can support model_id and s3url as well.
If user defined a option.model_id, we can assume they can add engine as well.

got it - i removed those checks here.

frankfliu · 2023-04-13T03:33:05Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+                BufferedReader reader =
+                        new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8))) {
+            return JsonUtils.GSON.fromJson(reader, JsonElement.class).getAsJsonObject();
+        } catch (IOException e) {


We should also catch JsonSyntaxException as well

frankfliu · 2023-04-13T03:40:49Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+        try (InputStream is = modelConfigUri.toURL().openStream();
+                BufferedReader reader =
+                        new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8))) {
+            return JsonUtils.GSON.fromJson(reader, JsonElement.class).getAsJsonObject();


Better create a class to hold the config (we only need define the field we care about)

siddvenk · 2023-04-13T23:35:05Z

wlm/src/test/java/ai/djl/serving/wlm/ModelInfoTest.java

+        Path deepspeedLocation = Paths.get("/usr/local/bin/deepspeed");
+        boolean deepspeedExisted = true;
+        if (!Files.exists(deepspeedLocation)) {
+            Files.createDirectories(deepspeedLocation);
+            deepspeedExisted = false;
+        }
+        Path fastertransformerLocation = Paths.get("/usr/local/backends/fastertransformer");
+        boolean fastertransformerExisted = true;
+        if (!Files.exists(fastertransformerLocation)) {
+            Files.createDirectories(fastertransformerLocation);
+            fastertransformerExisted = false;
+        }


I added this for the unit tests, but not a big fan of it. We can instead test this with some integration tests?

frankfliu · 2023-04-13T23:34:00Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+    //  This represents  the config of huggingface models NLP models as well
+    // as the config of diffusers models. The config is different for both, but for
+    // now we can leverage a single class since we don't need too much information from the config.
+    static class HuggingFaceModelConfig {


Suggested change

static class HuggingFaceModelConfig {

static final class HuggingFaceModelConfig {

frankfliu · 2023-04-13T23:36:30Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

-            throw e;
+            Gson gson =
+                    JsonUtils.builder()
+                            .setFieldNamingPolicy(FieldNamingPolicy.LOWER_CASE_WITH_UNDERSCORES)


Why we need this since we already use @SerializedName?

left over from my testing - this should be removed good catch

frankfliu · 2023-04-13T23:53:10Z

wlm/src/main/java/ai/djl/serving/wlm/ModelInfo.java

+        } else if (modelId != null) {
+            prop.put("option.modelId", modelId);
+            configUri = URI.create("https://huggingface.co/" + modelId + "/raw/main/config.json");
+            HttpURLConnection configUrl = (HttpURLConnection) configUri.toURL().openConnection();


We can consider use "OPTION" method instead of GET. I think it should work

I'll explore that and add as follow up if i get it to work.

siddvenk added 10 commits April 10, 2023 18:24

Infer LMI engine in modelInfo

f9a7238

add tests for infer lmi engine

2c56a95

Don't infer lmi engine when no hf model id found

e2d10e2

also support just tar.gz model format without explicit hf hub id

accc615

Support case where hf model id is not supplied but model artifacts ar…

00930c6

…e in model dir

format java code

475e609

fix some issues with testing directories

90ec4d3

stable diffusion and s3 model artifact support

53f1653

remove test logging

1bb478c

throw exception when hf model id is not found

707a0a1

siddvenk requested review from zachgk, frankfliu and a team as code owners April 13, 2023 00:54

lanking520 reviewed Apr 13, 2023

View reviewed changes

frankfliu reviewed Apr 13, 2023

View reviewed changes

refactor json parsing and check if python deps are installed

0913113

siddvenk commented Apr 13, 2023

View reviewed changes

frankfliu reviewed Apr 13, 2023

View reviewed changes

Remove some unit tests that depend on python dependencies

f433f0c

frankfliu approved these changes Apr 13, 2023

View reviewed changes

siddvenk merged commit d042a46 into deepjavalibrary:master Apr 14, 2023

siddvenk deleted the infer-lmi-engine branch June 13, 2023 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer lmi engine #623

Infer lmi engine #623

siddvenk commented Apr 13, 2023

lanking520 Apr 13, 2023

siddvenk Apr 13, 2023

frankfliu Apr 13, 2023

siddvenk Apr 13, 2023

frankfliu Apr 13, 2023

siddvenk Apr 13, 2023

frankfliu Apr 13, 2023

siddvenk Apr 13, 2023

siddvenk Apr 13, 2023

frankfliu Apr 13, 2023

frankfliu Apr 13, 2023

siddvenk Apr 13, 2023

frankfliu Apr 13, 2023

siddvenk Apr 14, 2023

	static class HuggingFaceModelConfig {
	static final class HuggingFaceModelConfig {

Infer lmi engine #623

Infer lmi engine #623

Conversation

siddvenk commented Apr 13, 2023

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment