Skip to content

Commit

Permalink
fix: disable cuda graphs with lora and warn
Browse files Browse the repository at this point in the history
  • Loading branch information
drbh committed Jul 8, 2024
1 parent 4dc908a commit 1ebcc9d
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions server/text_generation_server/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,15 @@ def serve(
f"LoRA adapters are enabled. This is an experimental feature and may not work as expected."
)

# TODO: enable lora with cuda graphs. for now disable cuda graphs if lora is enabled
# and warn the user
if len(lora_adapter_ids) > 0 and os.getenv("CUDA_GRAPHS", None) is not None:
logger.warning(
f"LoRa adapter are not supported with CUDA Graphs. Disabling CUDA Graphs."
)
global CUDA_GRAPHS
CUDA_GRAPHS = None

# Downgrade enum into str for easier management later on
quantize = None if quantize is None else quantize.value
dtype = None if dtype is None else dtype.value
Expand Down

0 comments on commit 1ebcc9d

Please sign in to comment.