fal-ai · turbo1912 · Jul 3, 2023 · Jul 3, 2023
diff --git a/docsite/docs/fal-serverless/authentication/_category_.yml b/docsite/docs/fal-serverless/authentication/_category_.yml
@@ -1,5 +1,5 @@
 label: "Authentication"
-position: 2
+position: 3
 collapsible: false
 collapsed: false
 link:

diff --git a/docsite/docs/fal-serverless/examples/_category_.yml b/docsite/docs/fal-serverless/examples/_category_.yml
@@ -1,5 +1,5 @@
 label: "Examples"
-position: 8
+position: 2
 collapsible: false
 collapsed: false
 link:

diff --git a/docsite/docs/fal-serverless/examples/chat.md b/docsite/docs/fal-serverless/examples/chat.md
diff --git a/...erverless/examples/controlnet-scribble.md → ...ocs/fal-serverless/examples/controlnet.md b/...erverless/examples/controlnet-scribble.md → ...ocs/fal-serverless/examples/controlnet.md
@@ -1,7 +1,11 @@
-# Deploying a Custom ControlNet Model Using fal-serverless
+---
+sidebar_position: 2
+---
 
-fal-serverless is a serverless platform that enables you to run Python functions on cloud infrastructure. In this example, we will demonstrate how to use fal-serverless for deploying a custom ControlNet model.
+# Restyle Room Photos with ControlNet
+In this example, we will demonstrate how to use fal-serverless for deploying a ControlNet model.
 
+## 1. Create a new file called controlnet.py
 ```python
 from __future__ import annotations
 from fal_serverless import isolated, cached
@@ -10,7 +14,6 @@ from pathlib import Path
 import base64
 import io
 
-
 requirements = [
     "controlnet-aux",
     "diffusers",
@@ -21,12 +24,6 @@ requirements = [
     "xformers"
 ]
 
-def image_to_base64(image_path):
-    with open(image_path, "rb") as image_file:
-        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
-    return encoded_string
-
-
 def read_image_bytes(file_path):
     with open(file_path, "rb") as file:
         image_bytes = file.read()
@@ -69,10 +66,6 @@ def resize_image(input_image, resolution):
     )
     return img
 
-def save_image_from_bytes(image_bytes, output_path):
-    with open(output_path, "wb") as file:
-        file.write(image_bytes)
-
 @isolated(
     requirements=requirements,
     machine_type="GPU",
@@ -91,7 +84,6 @@ def generate(
     pipe = load_model()
     image = Image.open(io.BytesIO(image_bytes))
 
-
     canny = CannyDetector()
     init_image = image.convert("RGB")
 
@@ -123,3 +115,23 @@ def generate(
     list_of_bytes = [read_image_bytes(out_dir / f) for f in file_names]
     return list_of_bytes
 ```
+
+## 2. Deploy the model as an endpoint
+To use this fal-serverless function as an API, you can serve it with the `fal-serverless` CLI command:
+
+```bash
+fal-serverless fn serve controlnet.py generate --alias controlnet --auth public
+```
+
+This will return a URL like:
+```
+Registered a new revision for function 'controltest' (revision='c75db134-23f0-4863-94cd-3358d6c8d94c').
+URL: https://user_id-controlnet.gateway.alpha.fal.ai
+```
+
+## 3. Test it out
+```bash
+curl https://user_id-controlnet.gateway.alpha.fal.ai/ -H 'content-type: application/json' -H 'accept: application/json, */*;q=0.5' -d '{"image_url":"https://restore.tchabitat.org/hubfs/blog/2019%20Blog%20Images/July/Old%20Kitchen%20Cabinets%20-%20Featured%20Image.jpg","prompt":"scandinavian kitchen","num_samples":1,"num_steps":30}'
+```
+
+This should return a JSON with the image encoded in base64.
diff --git a/docsite/docs/fal-serverless/examples/image-restoration.md b/docsite/docs/fal-serverless/examples/image-restoration.md
@@ -2,7 +2,7 @@
 sidebar_position: 4
 ---
 
-# Image restoration with Transformers
+# Restore Old Images with Transformers
 
 In this example, we will demonstrate how to use the [SwinIR](https://github.com/JingyunLiang/SwinIR) library and fal-serverless to restore images. SwinIR is an image restoration library that uses a Swin Transformer to restore images. The [Swin Transformer](https://arxiv.org/abs/2103.14030) is a type of neural network architecture that is designed for processing images. The Swin Transformer is similar to the popular Vision Transformer (ViT) architecture, but it uses a hierarchical structure that allows it to process images more efficiently. SwinIR uses a pre-trained Swin Transformer to restore images.
 

diff --git a/docsite/docs/fal-serverless/examples/llama.md b/docsite/docs/fal-serverless/examples/llama.md
@@ -0,0 +1,107 @@
+---
+sidebar_position: 3
+---
+
+# Run LLMs with llama.cpp (OpenAI API Compatible Server)
+
+In this example, we will demonstrate how to use fal-serverless for deploying any llama based language model and serving it through a OpenAI API compatible server with SSE.
+
+# 1. Use already deployed example
+
+If you want to use an already deployed API, here is a public endpoint running on a T4:
+
+https://110602490-llama-server.gateway.alpha.fal.ai/docs
+
+To see this API in action:
+
+```bash
+curl -X POST -H "Content-Type: application/json" \
+-H "Accept: text/event-stream" \
+-H "Authorization: Access-Control-Allow-Origin: *" \
+-d '{
+  "messages": [
+    {
+      "role": "user",
+      "content": "can you write a happy story"
+    }
+   ],
+   "stream": true,
+   "model": "gpt-3.5-turbo",
+   "max_tokens": 2000
+   }' \
+https://110602490-llama-server.gateway.alpha.fal.ai/v1/chat/completions \
+```
+
+This should return a streaming response.
+
+# 2. To deploy your own version:
+
+In this example, we will use the conda backend so that we can install CUDA dependencies. First, create the files below:
+
+**llama_cpp_env.yml**
+
+```yaml
+name: myenv
+channels:
+  - conda-forge
+  - nvidia/label/cuda-12.0.1
+dependencies:
+  - cuda-toolkit
+  - pip
+  - pip:
+      - llama-cpp-python[server]
+      - cmake
+      - setuptools
+```
+
+**llama_cpp.py**
+
+```python
+from fal_serverless import isolated, cached
+
+MODEL_URL = "https://huggingface.co/TheBloke/Vicuna-7B-CoT-GGML/resolve/main/vicuna-7B-cot.ggmlv3.q4_0.bin"
+MODEL_PATH = "/data/models/vicuna-7B-cot.ggmlv3.q4_0.bin"
+
+@isolated(
+    kind="conda",
+    env_yml="llama_cpp_env.yml",
+    machine_type="M",
+)
+def download_model():
+    print("---> This is download_model()")
+    import os
+
+    if not os.path.exists("/data/models"):
+        os.system("mkdir /data/models")
+    if not os.path.exists(MODEL_PATH):
+        print("Downloading SAM model.")
+        os.system(f"cd /data/models && wget {MODEL_URL}")
+
+@isolated(
+    kind="conda",
+    env_yml="llama_cpp_env.yml",
+    machine_type="GPU-T4",
+    exposed_port=8080,
+    keep_alive=30
+)
+def llama_server():
+    import uvicorn
+    from llama_cpp.server import app
+
+    settings = app.Settings(model=MODEL_PATH, n_gpu_layers=96)
+
+    server = app.create_app(settings=settings)
+    uvicorn.run(server, host="0.0.0.0", port=8080)
+```
+
+This script has two main functions: one two download the model, and the second one to start the server.
+
+We first need to download the model. You do this by calling the `download_model()` from a Python context. 
+
+We then deploy this as a public endpoint:
+
+```bash
+fal-serverless function serve llama_cpp.py llama_server --alias llama-server --auth public
+```
+
+This should return a URL, and you can use it like the above. First deploy might take a little bit of time.
diff --git a/docsite/docs/fal-serverless/examples/sentiment-analysis.md b/docsite/docs/fal-serverless/examples/sentiment-analysis.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 1
+sidebar_position: 6
 ---
 
 # Sentiment Analysis with dbt