Description
System Info
System Info
Privately hosted instance of TGI
Version: 2.2.0
Deployed as a standalone kserve predictor
Model: Mixtral-8x7b-instruct, also llama3-1-70b-instruct (the same prompts are not failing on both, but the error types are the same. The errors below are using mixtral).
GPU: A100
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
{"model": "mixtral-8x7b-instruct-completion", "messages": [{"role": "user", "content": "Could you track down the latitude and longitude for this IP address I'm concerned about? It's 172.16.254.1. I've been monitoring the network and this one's been popping up with some strange activity. Note that the provided function is in Python. provide a JSON"}], "temperature": 0.1, "max_tokens": 250, "stream": false, "echo": false, "tools": [{"type": "function", "function": {"name": "get_coordinate_by_ip_address", "description": "Finds the latitude and longitude of an IP address.", "parameters": {"type": "object", "properties": {"ip_address": {"type": "string", "description": "The IP address to find the location of."}}, "required": ["ip_address"]}}}], "tool_choice": "auto"}
b'{"error":"Tool error: invalid escape at line 3 column 15","error_type":"tool_error"}'
{'model': 'mixtral-8x7b-instruct', 'messages': [{'role': 'user', 'content': "To better understand the volatility and risk associated with this particular stock, I need to calculate the standard deviation of its daily closing prices over the past 10 trading days. Here are the figures I've gathered: 1000, 2000, 3000, 4000, 5000, 7000, 9000, 15000, 20000, and 30000. Can you provide me with the standard deviation for these closing prices?\n Note that the provided function is in Python. provide a JSON"}], 'tools': [{'type': 'function', 'function': {'name': 'calculate_standard_deviation', 'description': 'Calculates the standard deviation of a list of numbers.', 'parameters': {'type': 'dict', 'properties': {'numbers': {'type': 'array', 'items': {'type': 'float'}, 'description': 'The list of numbers.'}}, 'required': ['numbers']}}}], 'tool_choice': 'auto', 'temperature': 0.7, 'top_p': 0.99, 'max_tokens': 1200}
RESPONSE
b'{"error":"Request failed during generation: Server error: CANCELLED","error_type":"generation"}'
Stack trace:
ERROR text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 486, in parse
return super(_ParsePattern, self).parse()
File "/opt/conda/lib/python3.10/site-packages/interegular/utils/simple_parser.py", line 63, in parse
raise NoMatch(self.data, max(self._expected), self._expected[max(self._expected)])
interegular.utils.simple_parser.NoMatch: Can not match at index 858. Got '))?[\\', expected any of ['*', '+', '?', '{', '*', '+', '?', '{', '(', '[', '\\', '.', '$', '^', "<Any 1 except ('.', '?', '\\\\', '(', ')', '|', '*', '[', '^', '$', '+')>", '|'].
Context(data[-10:+10]): '*"[\\n ]*\\}))?[\\n ]*\\'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 145, in Prefill
batch = self.model.batch_type.from_pb(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 442, in from_pb
return cls.from_tokenized(pb, tokenizer, batch_tokenized_inputs, dtype, device)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 322, in from_tokenized
next_token_chooser = HeterogeneousNextTokenChooser.from_pb(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 486, in from_pb
return HeterogeneousNextTokenChooser(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 284, in __init__
HeterogeneousGrammarLogitProcessor(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 570, in __init__
fsm = GrammarLogitProcessor._cached_compile_fsm(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 527, in _cached_compile_fsm
fsm = RegexFSM(schema, tokenizer)
File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/fsm.py", line 121, in __init__
self.states_to_token_maps, self.empty_token_ids = create_states_mapping(
File "/opt/conda/lib/python3.10/site-packages/outlines/caching.py", line 74, in wrapper
result = cached_function(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/fsm.py", line 102, in create_states_mapping
regex_pattern = interegular.parse_pattern(regex_string)
File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 730, in parse_pattern
out = p.parse()
File "/opt/conda/lib/python3.10/site-packages/interegular/utils/simple_parser.py", line 38, in w
return m(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 488, in parse
raise InvalidSyntax
interegular.patterns.InvalidSyntax
This is the same stack trace as in here: #2240 . This was fairly consistent to reproduce, though this stack trace does not always appear in our server logs.
Expected behavior
A valid response -- e.g.,
{"model": "mixtral-8x7b-instruct-completion", "messages": [{"role": "user", "content": "I'm working on a report about a basketball player's average performance throughout the season. The data I have includes the points they scored in each game: 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160. To complete my analysis, I need to calculate the mean score per game. Can you help me with that? Please return your answer as a JSON"}], "temperature": 0.1, "max_tokens": 250, "stream": false, "echo": false, "tools": [{"type": "function", "function": {"name": "calculate_mean", "description": "Calculates the mean of a list of numbers.", "parameters": {"type": "object", "properties": {"numbers": {"type": "array", "items": {"type": "number"}, "description": "The list of numbers."}}, "required": ["numbers"]}}}], "tool_choice": "auto"}
b'{"object":"chat.completion","id":"","created":1721938095,"model":"/mnt/models","system_fingerprint":"2.2.0-sha-db7e043","choices":[{"index":0,"message":{"role":"assistant","tool_calls":[{"id":"0","type":"function","function":{"description":null,"name":"mean","arguments":{"numbers":[15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160]}}}]},"logprobs":null,"finish_reason":"eos_token"}],"usage":{"prompt_tokens":202,"completion_tokens":157,"total_tokens":359}}'
It is a little cryptic as to why the other responses are failing. We would love to be able to see the output of the model regardless if possible, as this would provide a better experience for downstream users (also it would be nice to not have the server crash every so often when this occurs). We did some experimentation, and found that:
- Changing the temperature and changing the model resulted in different payloads getting errors, sometimes fewer and sometimes more. Despite the stack trace mentioning that "Method Prefill encountered an error.", this may suggest that it's to do with the generated text having an issue?
- We were digging through the TGI codebase and found the call to
outlines.fsm
, but were unable to reproduce this error at all locally, so we're not sure what exactly is causing the issue still.
Thanks again for helping so quickly with the last issue, we really appreciate it! It definitely solved some of our issues + the 'tool_choice="auto"' one.