Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenAI: do we need to support multiple finish reasons? #1277

Open
lmolkova opened this issue Jul 24, 2024 · 2 comments
Open

GenAI: do we need to support multiple finish reasons? #1277

lmolkova opened this issue Jul 24, 2024 · 2 comments
Labels
area:gen-ai enhancement New feature or request

Comments

@lmolkova
Copy link
Contributor

See #980 (comment).

Context:

  • some models return multiple choices in one response
  • each choice has a finish reason that explains why model stopped generating content which contains values like stop, length, filtered... - this could be very useful to know, query and aggregate by

Having array attribute is problematic since it's harder to query and not really possible to use on metrics or per-choice events.

Multiple choices are supported by a limited set of models. Event when multiple choices are supported some SDKs (e.g. openai-dotnet) make a choice not to expose it on the convenience API level to simplify the design and provide much more friendly experience.
Most of examples and documentation assumes there is just one choice.

Given this, it seems that in most of the cases there will be just one choice and just one finish reason on each span.

The proposal is to

  • change type from string array to string
  • rename the attribute to gen_ai.response.finish_reason
  • report one value (either one of the following)
    1. last reason
    2. comma-separated reasons like stop,length
  • we can reuse this attribute on events in case we want to promote corresponding body field to attributes
@lmolkova
Copy link
Contributor Author

More context on comma-separated list:

  • the attribute is useful on metrics (different finish reasons may correlate to different latencies and error rates)
  • theoretically comma-separated list has high(ish) cardinality
  • OpenAI may report separate metric for number of returned choices with an individual finish reason on each measurement

@lmolkova
Copy link
Contributor Author

lmolkova commented Aug 7, 2024

Based on offline discussions, we need to figure out batching story too and it might be related.

People may use n > 1 to save on costs (input tokens are charged once) - https://community.openai.com/t/how-does-n-parameter-work-in-chat-completions/288725.

Assuming it's one of the popular scenarios, the alternative to squishing finish reasons could be:

  1. maybe populate finish_reason on span if and only if there is one choice. There could be other things we can do here:
    • maybe populate error.type if the worst-of-finish-reasons indicates an error?
  2. have a metric that measures number of choices (in addition to the number of requests) - report finish_reason as an attribute there
  3. populate finish_reason as an attribute on the relevant event, so it's easier to query.

P2 and p3 seem mostly non-controversial and don't necessarily depend on p1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:gen-ai enhancement New feature or request
2 participants