mlc-ai · MasterJH5574 · Apr 19, 2024 · Apr 19, 2024
diff --git a/docs/deploy/cli.rst b/docs/deploy/cli.rst
@@ -54,13 +54,13 @@ To run a model with MLC LLM in any platform, you can either:
 **Option 1: Use model prebuilts**
 
 To run ``mlc_llm``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
-For example, to run the MLC Llama 2 7B Q4F16_1 model (`Repo link <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC>`_),
-simply use ``HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC``. The model weights and library will be downloaded
+For example, to run the MLC Llama 3 8B Q4F16_1 model (`Repo link <https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC>`_),
+simply use ``HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC``. The model weights and library will be downloaded
 automatically from Huggingface.
 
 .. code:: shell
 
-  mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
+  mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
 
 .. code:: shell
 
@@ -74,13 +74,11 @@ automatically from Huggingface.
                         Note: Separate stop words in the `stop` option with commas (,).
     Multi-line input: Use escape+enter to start a new line.
 
-  [INST]: What's the meaning of life
-  [/INST]:
-  Ah, a question that has puzzled philosophers and theologians for centuries! The meaning
-  of life is a deeply personal and subjective topic, and there are many different
-  perspectives on what it might be. However, here are some possible answers that have been
-  proposed by various thinkers and cultures:
-  ...
+  user: What's the meaning of life
+  assistant:
+  What a profound and intriguing question! While there's no one definitive answer, I'd be happy to help you explore some perspectives on the meaning of life.
+
+  The concept of the meaning of life has been debated and...
 
 
 **Option 2: Use locally compiled model weights and libraries**

diff --git a/docs/get_started/introduction.rst b/docs/get_started/introduction.rst
@@ -37,7 +37,7 @@ You can run MLC chat through a one-liner command:
 
 .. code:: bash
 
-    mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
+    mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
 
 It may take 1-2 minutes for the first time running this command.
 After waiting, this command launch a chat interface where you can enter your prompt and chat with the model.
@@ -91,7 +91,7 @@ You can save the code below into a Python file and run it.
   from mlc_llm import LLMEngine
 
   # Create engine
-  model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
+  model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
   engine = LLMEngine(model)
 
   # Run chat completion in OpenAI API.
@@ -142,7 +142,7 @@ for OpenAI chat completion requests. The server can be launched in command line
 
 .. code:: bash
 
-  mlc_llm serve HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
+  mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
 
 The server is hooked at ``http://127.0.0.1:8000`` by default, and you can use ``--host`` and ``--port``
 to set a different host and port.
@@ -154,7 +154,7 @@ we can open a new shell and send a cURL request via the following command:
   curl -X POST \
     -H "Content-Type: application/json" \
     -d '{
-          "model": "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC",
+          "model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",
           "messages": [
               {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}
           ]
@@ -280,7 +280,7 @@ environments (e.g. SteamDeck).
 
 .. code:: bash
 
-    mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device vulkan
+    mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC --device vulkan
 
 The same core LLM runtime engine powers all the backends, enabling the same model to be deployed across backends as
 long as they fit within the memory and computing budget of the corresponding hardware backend.

diff --git a/docs/get_started/quick_start.rst b/docs/get_started/quick_start.rst
@@ -23,7 +23,7 @@ It is recommended to have at least 6GB free VRAM to run it.
       from mlc_llm import LLMEngine
 
       # Create engine
-      model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
+      model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
       engine = LLMEngine(model)
 
       # Run chat completion in OpenAI API.
@@ -57,7 +57,7 @@ It is recommended to have at least 6GB free VRAM to run it.
 
     .. code:: shell
 
-      mlc_llm serve HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
+      mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
 
     **Send requests to server.** When the server is ready (showing ``INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)``),
     open a new shell and send a request via the following command:
@@ -67,7 +67,7 @@ It is recommended to have at least 6GB free VRAM to run it.
       curl -X POST \
         -H "Content-Type: application/json" \
         -d '{
-              "model": "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC",
+              "model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",
               "messages": [
                   {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}
               ]
@@ -94,7 +94,7 @@ It is recommended to have at least 6GB free VRAM to run it.
 
     .. code:: bash
 
-      mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
+      mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
 
 
     If you are using windows/linux/steamdeck and would like to use vulkan,

diff --git a/docs/prebuilt_models.rst b/docs/prebuilt_models.rst
@@ -68,7 +68,7 @@ For more, please see :ref:`the CLI page <deploy-cli>`, and the :ref:`the Python
 
     .. code:: shell
 
-      mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
+      mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
 
 
   To run the model with Python API, see :ref:`the Python page <deploy-python-chat-module>` (all other downloading steps are the same as CLI).

diff --git a/examples/python/sample_mlc_engine.py b/examples/python/sample_mlc_engine.py
@@ -1,7 +1,7 @@
 from mlc_llm import LLMEngine
 
 # Create engine
-model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
+model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
 engine = LLMEngine(model)
 
 # Run chat completion in OpenAI API.