GenAI: do we need to support multiple finish reasons? #1277

lmolkova · 2024-07-24T21:55:07Z

Context:

some models return multiple choices in one response
each choice has a finish reason that explains why model stopped generating content which contains values like stop, length, filtered... - this could be very useful to know, query and aggregate by

Having array attribute is problematic since it's harder to query and not really possible to use on metrics or per-choice events.

Multiple choices are supported by a limited set of models. Event when multiple choices are supported some SDKs (e.g. openai-dotnet) make a choice not to expose it on the convenience API level to simplify the design and provide much more friendly experience.
Most of examples and documentation assumes there is just one choice.

Given this, it seems that in most of the cases there will be just one choice and just one finish reason on each span.

The proposal is to

change type from string array to string
rename the attribute to gen_ai.response.finish_reason
report one value (either one of the following)
1. last reason
2. comma-separated reasons like stop,length
we can reuse this attribute on events in case we want to promote corresponding body field to attributes

The text was updated successfully, but these errors were encountered:

lmolkova · 2024-07-26T20:53:59Z

More context on comma-separated list:

the attribute is useful on metrics (different finish reasons may correlate to different latencies and error rates)
theoretically comma-separated list has high(ish) cardinality
OpenAI may report separate metric for number of returned choices with an individual finish reason on each measurement

lmolkova · 2024-08-07T17:16:57Z

Based on offline discussions, we need to figure out batching story too and it might be related.

People may use n > 1 to save on costs (input tokens are charged once) - https://community.openai.com/t/how-does-n-parameter-work-in-chat-completions/288725.

Assuming it's one of the popular scenarios, the alternative to squishing finish reasons could be:

maybe populate finish_reason on span if and only if there is one choice. There could be other things we can do here:
- maybe populate error.type if the worst-of-finish-reasons indicates an error?
have a metric that measures number of choices (in addition to the number of requests) - report finish_reason as an attribute there
populate finish_reason as an attribute on the relevant event, so it's easier to query.

P2 and p3 seem mostly non-controversial and don't necessarily depend on p1.

lmolkova added enhancement New feature or request area:gen-ai labels Jul 24, 2024

lmolkova added this to GenAI Semantic Conventions and Instrumentation libraries Jul 24, 2024

github-actions bot assigned joaopgrassi Jul 24, 2024

lmolkova mentioned this issue Jul 24, 2024

Introduce per-message structured GenAI events instead of prompt/completion span events #980

Merged

2 tasks

lmolkova unassigned joaopgrassi Jul 25, 2024

lmolkova mentioned this issue Jul 26, 2024

Change gen_ai.response.finish_reasons type from array to string, use it on metrics #1291

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenAI: do we need to support multiple finish reasons? #1277

GenAI: do we need to support multiple finish reasons? #1277

lmolkova commented Jul 24, 2024

lmolkova commented Jul 26, 2024

lmolkova commented Aug 7, 2024 •

edited

Loading

GenAI: do we need to support multiple finish reasons? #1277

GenAI: do we need to support multiple finish reasons? #1277

Comments

lmolkova commented Jul 24, 2024

lmolkova commented Jul 26, 2024

lmolkova commented Aug 7, 2024 • edited Loading

lmolkova commented Aug 7, 2024 •

edited

Loading