Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools still giving EoF errors on generated JSON #2310

Closed
1 of 4 tasks
ArjunBhalla98 opened this issue Jul 25, 2024 · 5 comments
Closed
1 of 4 tasks

Tools still giving EoF errors on generated JSON #2310

ArjunBhalla98 opened this issue Jul 25, 2024 · 5 comments
Assignees

Comments

@ArjunBhalla98
Copy link

ArjunBhalla98 commented Jul 25, 2024

System Info

System Info
Privately hosted instance of TGI
Version: 2.2.0

Deployed as a standalone kserve predictor
Model: Mixtral-8x7b-instruct, also llama3-1-70b-instruct (the same prompts are not failing on both, but the error types are the same. The errors below are using mixtral).
GPU: A100

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

{"model": "mixtral-8x7b-instruct-completion", "messages": [{"role": "user", "content": "Could you track down the latitude and longitude for this IP address I'm concerned about? It's 172.16.254.1. I've been monitoring the network and this one's been popping up with some strange activity. Note that the provided function is in Python. provide a JSON"}], "temperature": 0.1, "max_tokens": 250, "stream": false, "echo": false, "tools": [{"type": "function", "function": {"name": "get_coordinate_by_ip_address", "description": "Finds the latitude and longitude of an IP address.", "parameters": {"type": "object", "properties": {"ip_address": {"type": "string", "description": "The IP address to find the location of."}}, "required": ["ip_address"]}}}], "tool_choice": "auto"}

b'{"error":"Tool error: invalid escape at line 3 column 15","error_type":"tool_error"}'
{'model': 'mixtral-8x7b-instruct', 'messages': [{'role': 'user', 'content': "To better understand the volatility and risk associated with this particular stock, I need to calculate the standard deviation of its daily closing prices over the past 10 trading days. Here are the figures I've gathered: 1000, 2000, 3000, 4000, 5000, 7000, 9000, 15000, 20000, and 30000. Can you provide me with the standard deviation for these closing prices?\n Note that the provided function is in Python. provide a JSON"}], 'tools': [{'type': 'function', 'function': {'name': 'calculate_standard_deviation', 'description': 'Calculates the standard deviation of a list of numbers.', 'parameters': {'type': 'dict', 'properties': {'numbers': {'type': 'array', 'items': {'type': 'float'}, 'description': 'The list of numbers.'}}, 'required': ['numbers']}}}], 'tool_choice': 'auto', 'temperature': 0.7, 'top_p': 0.99, 'max_tokens': 1200}
RESPONSE
b'{"error":"Request failed during generation: Server error: CANCELLED","error_type":"generation"}'

Stack trace:

ERROR text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 486, in parse
    return super(_ParsePattern, self).parse()
  File "/opt/conda/lib/python3.10/site-packages/interegular/utils/simple_parser.py", line 63, in parse
    raise NoMatch(self.data, max(self._expected), self._expected[max(self._expected)])
interegular.utils.simple_parser.NoMatch: Can not match at index 858. Got '))?[\\', expected any of ['*', '+', '?', '{', '*', '+', '?', '{', '(', '[', '\\', '.', '$', '^', "<Any 1 except ('.', '?', '\\\\', '(', ')', '|', '*', '[', '^', '$', '+')>", '|'].
Context(data[-10:+10]): '*"[\\n ]*\\}))?[\\n ]*\\'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
 File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 106, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 145, in Prefill
    batch = self.model.batch_type.from_pb(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 442, in from_pb
    return cls.from_tokenized(pb, tokenizer, batch_tokenized_inputs, dtype, device)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 322, in from_tokenized
    next_token_chooser = HeterogeneousNextTokenChooser.from_pb(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 486, in from_pb
    return HeterogeneousNextTokenChooser(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 284, in __init__
    HeterogeneousGrammarLogitProcessor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 570, in __init__
    fsm = GrammarLogitProcessor._cached_compile_fsm(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 527, in _cached_compile_fsm
    fsm = RegexFSM(schema, tokenizer)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/fsm.py", line 121, in __init__
    self.states_to_token_maps, self.empty_token_ids = create_states_mapping(
  File "/opt/conda/lib/python3.10/site-packages/outlines/caching.py", line 74, in wrapper
    result = cached_function(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/fsm.py", line 102, in create_states_mapping
    regex_pattern = interegular.parse_pattern(regex_string)
  File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 730, in parse_pattern
    out = p.parse()
  File "/opt/conda/lib/python3.10/site-packages/interegular/utils/simple_parser.py", line 38, in w
    return m(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/interegular/patterns.py", line 488, in parse
    raise InvalidSyntax
interegular.patterns.InvalidSyntax

This is the same stack trace as in here: #2240 . This was fairly consistent to reproduce, though this stack trace does not always appear in our server logs.

Expected behavior

A valid response -- e.g.,

{"model": "mixtral-8x7b-instruct-completion", "messages": [{"role": "user", "content": "I'm working on a report about a basketball player's average performance throughout the season. The data I have includes the points they scored in each game: 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160. To complete my analysis, I need to calculate the mean score per game. Can you help me with that? Please return your answer as a JSON"}], "temperature": 0.1, "max_tokens": 250, "stream": false, "echo": false, "tools": [{"type": "function", "function": {"name": "calculate_mean", "description": "Calculates the mean of a list of numbers.", "parameters": {"type": "object", "properties": {"numbers": {"type": "array", "items": {"type": "number"}, "description": "The list of numbers."}}, "required": ["numbers"]}}}], "tool_choice": "auto"}

b'{"object":"chat.completion","id":"","created":1721938095,"model":"/mnt/models","system_fingerprint":"2.2.0-sha-db7e043","choices":[{"index":0,"message":{"role":"assistant","tool_calls":[{"id":"0","type":"function","function":{"description":null,"name":"mean","arguments":{"numbers":[15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160]}}}]},"logprobs":null,"finish_reason":"eos_token"}],"usage":{"prompt_tokens":202,"completion_tokens":157,"total_tokens":359}}'

It is a little cryptic as to why the other responses are failing. We would love to be able to see the output of the model regardless if possible, as this would provide a better experience for downstream users (also it would be nice to not have the server crash every so often when this occurs). We did some experimentation, and found that:

  • Changing the temperature and changing the model resulted in different payloads getting errors, sometimes fewer and sometimes more. Despite the stack trace mentioning that "Method Prefill encountered an error.", this may suggest that it's to do with the generated text having an issue?
  • We were digging through the TGI codebase and found the call to outlines.fsm, but were unable to reproduce this error at all locally, so we're not sure what exactly is causing the issue still.

Thanks again for helping so quickly with the last issue, we really appreciate it! It definitely solved some of our issues + the 'tool_choice="auto"' one.

@drbh drbh self-assigned this Jul 25, 2024
@ArjunBhalla98
Copy link
Author

Hi -- just to provide an update on this, we are seeing reasonably frequent occurrences on this, and sometimes when the error in the provided stack trace occurs it seems that the server will crash briefly, obviously causing other traffic to time out. Thanks again for taking a look!

@ArjunBhalla98
Copy link
Author

ArjunBhalla98 commented Aug 1, 2024

Also, we have just started seeing this related error (seems sporadic, even with model temp ~0. Sometimes rerunning will get rid of this, sometimes not. Assuming this is because our temp is maybe 0.001 instead of 0 as 0 causes an input validation error from pydantic)

2024-08-01T17:14:52.447085Z ERROR batch{batch_size=1}:prefill:prefill{id=79 size=1}:prefill{id=79 size=1}: text_generation_client: router/client/src/lib.rs:46: Server error: 
2024-08-01T17:14:52.447653Z ERROR chat_completions:generate:generate_stream:schedule:infer:send_error: text_generation_router::infer::v3::scheduler: router/src/infer/v3/scheduler.rs:493: Request failed during generation: Server error: 
2024-08-01T17:14:52.644814Z ERROR text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 118, in serve
    server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 120, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 111, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 145, in Prefill
    batch = self.model.batch_type.from_pb(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 453, in from_pb
    return cls.from_tokenized(pb, tokenizer, batch_tokenized_inputs, dtype, device)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 333, in from_tokenized
    next_token_chooser = HeterogeneousNextTokenChooser.from_pb(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 486, in from_pb
    return HeterogeneousNextTokenChooser(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/tokens.py", line 284, in __init__
    HeterogeneousGrammarLogitProcessor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 570, in __init__
    fsm = GrammarLogitProcessor._cached_compile_fsm(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 524, in _cached_compile_fsm
    schema = build_regex_from_schema(schema)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 83, in build_regex_from_schema
    return to_regex(resolver, content, whitespace_pattern)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 156, in to_regex
    subregex += to_regex(resolver, value, whitespace_pattern)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 185, in to_regex
    subregexes = [
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 186, in <listcomp>
    to_regex(resolver, t, whitespace_pattern) for t in instance["anyOf"]
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 222, in to_regex
    return to_regex(resolver, instance, whitespace_pattern)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 142, in to_regex
    subregex += to_regex(resolver, value, whitespace_pattern)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 142, in to_regex
    subregex += to_regex(resolver, value, whitespace_pattern)
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/json_schema.py", line 313, in to_regex
    resolver, instance["additionalProperties"], whitespace_pattern
KeyError: 'additionalProperties'

@drbh
Copy link
Collaborator

drbh commented Aug 2, 2024

Hi @ArjunBhalla98 apologies for the delay in response! I just took a deeper look at the issue and attempted to reproduce the tool failures locally.

I believe the issue is mainly due to the LLM generating "valid" text that is not a complete JSON blob. I believe this is happening for three reasons. 1. It's expected that the LLM will fail to generate JSON in rare cases, 2. the prompt supplied may be causing the generation not output the text in a parsable format. 3. One of the tools is not a valid JSON Schema.

I've tried both request locally with mistralai/Mixtral-8x7B-Instruct-v0.1, and removing

"Note that the provided function is in Python. provide a JSON"

from the prompt greaty improved the response and the ability to parse tools. Additionally when debugging the intermediate text it appeared that the value was often escaped (causing the parsing issue).

Additionally the second example above uses an invalid data type float. Tools must be a valid JSON schema in order to be able to be compiled. When testing locally I updated the tool to the json below

        {
            "type": "function",
            "function": {
                "name": "calculate_standard_deviation",
                "description": "Calculates the standard deviation of a list of numbers.",
                "parameters": {
                    "type": "dict",
                    "properties": {
                        "numbers": {
                            "type": "array",
                            "items": {"type": "number"},
                            "description": "The list of numbers.",
                        }
                    },
                    "required": ["numbers"],
                },
            },
        }

In order to make debugging easier and improve transparency I've just opened a PR that will return the generated text with the error message when it cannot be parsed into valid JSON. This should help provide insight into why a specific request has errored #2353

Also there is another PR in the works that should help improve visibility into how tools are formatted before they are processed by the model #2333. This helps provide insight into how prompts are formatted with tools, which can help with prompt engineering.

Would you kindly try changing you're prompt to avoid specific formatting instructions and ensure that tools are valid JSON schemas? Also those two PR's should be merged soon and they should help debug in the future

@ArjunBhalla98
Copy link
Author

Hi @drbh, no problem at all - we really appreciate you looking into this! Thanks for the thorough response.

to your points:

1/2. Makes sense! We had guessed that this might have been the issue, especially the escape character bit, as I quadruple checked that our input / payload was valid json.

  1. Yes I think actually we realised a short while after this ticket was submitted (apologies for not updating in a timely manner) the dataset we used seems to conform to an older version of the json schema — I believe we have fixed those on our end and that outlines itself actually throws an error and crashes the server (which I saw there was a fix for already) if the JSON schema is invalid.

Very interesting find on the prompt, I will try that ASAP! Most of our code gen prompts have this.

Given the nature of the main error, just propagating that intermediate text through / with the error that it was illegal JSON would be incredibly helpful. Thanks again!

@drbh
Copy link
Collaborator

drbh commented Aug 28, 2024

Hi @ArjunBhalla98 I believe these issues are fully resolved by the recent improvements/bug fixes to grammars and tool calling (#2463, #2454, #2391, etc...)

TGI will also return text that it fails to parse as of this PR #2353. This is much less likely to happen now, but it should help with debugging if it does.

Going to close this issue since these bugs should all be resolved on main. Please don't hesitate to reopen or create a new issue if you still experience any problems.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants