up

metascroy · metascroy · commit 13c480214766 · 2025-03-20T11:34:56.000-07:00
diff --git a/examples/models/llama/README.md b/examples/models/llama/README.md
@@ -382,7 +382,7 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
 
 ## Running with low-bit kernels
 
-We now give instructions for quantizating and running your model with low-bit kernels.  These are still experimental, and require you do development on an Arm-based Mac.  Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
+We now give instructions for quantizating and running your model with low-bit kernels.  These are still experimental, and require you do development on an Arm-based Mac.  Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.  Currently dynamic shapes must be disabled when exporting a model with these kernels.
 
 First export your model for lowbit quantization (step 2 above):
 
diff --git a/install_requirements.py b/install_requirements.py
@@ -119,7 +119,7 @@ def install_requirements(use_pytorch_nightly):
     # Install packages directly from local copy instead of pypi.
     # This is usually not recommended.
     new_env = os.environ.copy()
-    new_env["USE_CPP"] = "1"
+    new_env["USE_CPP"] = "1"  # install torchao kernels
     subprocess.run(
         [
             sys.executable,

Original file line number	Diff line number	Diff line change
`@@ -119,7 +119,7 @@ def install_requirements(use_pytorch_nightly):`
`119`	`119`	`# Install packages directly from local copy instead of pypi.`
`120`	`120`	`# This is usually not recommended.`
`121`	`121`	`new_env = os.environ.copy()`
`122`		`- new_env["USE_CPP"] = "1"`
	`122`	`+ new_env["USE_CPP"] = "1" # install torchao kernels`
`123`	`123`	`subprocess.run(`
`124`	`124`	`[`
`125`	`125`	`sys.executable,`