Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Adding mps support to base handler and regression test #3048

Merged
merged 20 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions docs/apple_silicon_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Apple Silicon Support

## What is supported
* TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware.
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
* For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker)

## Experimental Support

* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
* This is an experimental feature and NOT ALL models are guaranteed to work.
* Number of GPUs now reports GPUs on Apple Silicon

### Testing
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
* Models that have been tested and work: Resnet-18, Densenet161, Alexnet
* Models that have been tested and DO NOT work: MNIST


#### Example Resnet-18 Using MPS On Mac M1 Pro
```
serve % torchserve --start --model-store model_store_gen --models resnet-18=resnet-18.mar --ncs

Torchserve version: 0.10.0
Number of GPUs: 16
Number of CPUs: 10
Max heap size: 8192 M
Python executable: /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store:
Initial Models: resnet-18=resnet-18.mar
Log dir:
Metrics dir:
Netty threads: 0
Netty client threads: 0
Default workers per model: 16
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store:
CPP log config: N/A
Model config: N/A
024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2024-04-08T14:18:02,391 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: resnet-18.mar
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet-18
2024-04-08T14:18:02,699 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet-18, count: 16
...
...
serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
...
{
"tabby": 0.40966302156448364,
"tiger_cat": 0.3467046618461609,
"Egyptian_cat": 0.1300288736820221,
"lynx": 0.02391958422958851,
"bucket": 0.011532187461853027
}
...
```
#### Conda Example

```
(myenv) serve % pip list | grep torch
torch 2.2.1
torchaudio 2.2.1
torchdata 0.7.1
torchtext 0.17.1
torchvision 0.17.1
(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
(myenv3) serve % pip list | grep torch
torch 2.2.1
torch-model-archiver 0.10.0b20240312
torch-workflow-archiver 0.2.12b20240312
torchaudio 2.2.1
torchdata 0.7.1
torchserve 0.10.0b20240312
torchtext 0.17.1
torchvision 0.17.1
(myenv3) serve % torchserve --start --ncs --models densenet161.mar --model-store ./model_store_gen/
Torchserve version: 0.10.0
Number of GPUs: 0
Number of CPUs: 10
Max heap size: 8192 M
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Initial Models: densenet161.mar
Netty threads: 0
Netty client threads: 0
Default workers per model: 10
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
CPP log config: N/A
Model config: N/A
System metrics command: default
...
2024-03-12T15:58:54,702 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model densenet161 loaded.
2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
Model server started.
...
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
{
"tabby": 0.46661922335624695,
"tiger_cat": 0.46449029445648193,
"Egyptian_cat": 0.0661405548453331,
"lynx": 0.001292439759708941,
"plastic_bag": 0.00022909720428287983
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;
import io.netty.handler.ssl.util.SelfSignedCertificate;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.lang.reflect.Field;
import java.lang.reflect.Type;
import java.net.InetAddress;
Expand Down Expand Up @@ -835,6 +837,28 @@ private static int getAvailableGpu() {
for (String id : ids) {
gpuIds.add(Integer.parseInt(id));
}
} else if (System.getProperty("os.name").startsWith("Mac")) {
Process process = Runtime.getRuntime().exec("system_profiler SPDisplaysDataType");
int ret = process.waitFor();
if (ret != 0) {
return 0;
}

BufferedReader reader =
new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
if (line.contains("Chipset Model:") && !line.contains("Apple M1")) {
return 0;
}
if (line.contains("Total Number of Cores:")) {
String[] parts = line.split(":");
if (parts.length >= 2) {
return (Integer.parseInt(parts[1].trim()));
}
}
}
throw new AssertionError("Unexpected response.");
} else {
Process process =
Runtime.getRuntime().exec("nvidia-smi --query-gpu=index --format=csv");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,18 @@ public void testNoWorkflowState() throws ReflectiveOperationException, IOExcepti
workingDir + "/frontend/archive/src/test/resources/models",
configManager.getWorkflowStore());
}

@Test
public void testNumGpuM1() throws ReflectiveOperationException, IOException {
System.setProperty("tsConfigFile", "src/test/resources/config_test_env.properties");
ConfigManager.Arguments args = new ConfigManager.Arguments();
args.setModels(new String[] {"noop_v0.1"});
args.setSnapshotDisabled(true);
ConfigManager.init(args);
ConfigManager configManager = ConfigManager.getInstance();
String arch = System.getProperty("os.arch");
if (arch.equals("aarch64")) {
Assert.assertTrue(configManager.getNumberOfGpu() > 0);
}
}
}
181 changes: 181 additions & 0 deletions test/pytest/test_device_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
import shutil
from pathlib import Path
from unittest.mock import patch
import tempfile

import pytest
import test_utils
import requests
import os
import platform
from model_archiver import ModelArchiverConfig




CURR_FILE_PATH = Path(__file__).parent
REPO_ROOT_DIR = CURR_FILE_PATH.parent.parent
ROOT_DIR = os.path.join(tempfile.gettempdir(), "workspace")
REPO_ROOT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../../")
data_file_zero = os.path.join(REPO_ROOT, "test/pytest/test_data/0.png")
config_file = os.path.join(REPO_ROOT, "test/resources/config_token.properties")
mnist_scriptes_py = os.path.join(REPO_ROOT,"examples/image_classifier/mnist/mnist.py")

HANDLER_PY = """
from ts.torch_handler.base_handler import BaseHandler

class deviceHandler(BaseHandler):

def initialize(self, context):
super().initialize(context)
assert self.get_device().type == "mps"
"""

MODEL_CONFIG_YAML = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
"""

MODEL_CONFIG_YAML_GPU = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
deviceType: "gpu"
"""

MODEL_CONFIG_YAML_CPU = """
#frontend settings
# TorchServe frontend parameters
minWorkers: 1
batchSize: 4
maxWorkers: 4
deviceType: "cpu"
"""


@pytest.fixture(scope="module")
def model_name():
yield "mnist"

@pytest.fixture(scope="module")
def work_dir(tmp_path_factory, model_name):
return Path(tmp_path_factory.mktemp(model_name))

@pytest.fixture(scope="module")
def model_config_name(request):
def get_config(param):
if param == "cpu":
return MODEL_CONFIG_YAML_CPU
elif param == "gpu":
return MODEL_CONFIG_YAML_GPU
else:
return MODEL_CONFIG_YAML

return get_config(request.param)

@pytest.fixture(scope="module", name="mar_file_path")
def create_mar_file(work_dir, model_archiver, model_name, model_config_name):


mar_file_path = work_dir.joinpath(model_name + ".mar")

model_config_yaml_file = work_dir / "model_config.yaml"
model_config_yaml_file.write_text(model_config_name)

model_py_file = work_dir / "model.py"

model_py_file.write_text(mnist_scriptes_py)

handler_py_file = work_dir / "handler.py"
handler_py_file.write_text(HANDLER_PY)

config = ModelArchiverConfig(
model_name=model_name,
version="1.0",
serialized_file=None,
model_file=mnist_scriptes_py, #model_py_file.as_posix(),
handler=handler_py_file.as_posix(),
extra_files=None,
export_path=work_dir,
requirements_file=None,
runtime="python",
force=False,
archive_format="default",
config_file=model_config_yaml_file.as_posix(),
)

with patch("archiver.ArgParser.export_model_args_parser", return_value=config):
model_archiver.generate_model_archive()

assert mar_file_path.exists()

yield mar_file_path.as_posix()

# Clean up files

mar_file_path.unlink(missing_ok=True)

# Clean up files

@pytest.fixture(scope="module", name="model_name")
def register_model(mar_file_path, model_store, torchserve):
"""
Register the model in torchserve
"""
shutil.copy(mar_file_path, model_store)

file_name = Path(mar_file_path).name

model_name = Path(file_name).stem

params = (
("model_name", model_name),
("url", file_name),
("initial_workers", "1"),
("synchronous", "true"),
("batch_size", "1"),
)

test_utils.reg_resp = test_utils.register_model_with_params(params)

yield model_name

test_utils.unregister_model(model_name)


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["gpu"], indirect=True)
def test_m1_device(model_name, model_config_name):

response = requests.get(f"http://localhost:8081/models/{model_name}")

print("-----TEST-----")
print(response.content)
assert response.status_code == 200, "Describe Failed"


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["cpu"], indirect=True)
def test_m1_device_cpu(model_name, model_config_name):

response = requests.get(f"http://localhost:8081/models/{model_name}")

print("-----TEST-----")
print(response.content)
assert response.status_code == 404, "Describe Worked"


@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
@pytest.mark.parametrize("model_config_name", ["default"], indirect=True)
def test_m1_device_default(model_name, model_config_name):

response = requests.get(f"http://localhost:8081/models/{model_name}")

print("-----TEST-----")
print(response.content)
assert response.status_code == 200, "Describe Failed"
Loading