open-telemetry · lmolkova · Apr 16, 2024 · Jan 12, 2024 · Jan 22, 2024 · Jan 23, 2024
diff --git a/docs/ai/README.md b/docs/ai/README.md
@@ -0,0 +1,20 @@
+<!--- Hugo front matter used to generate the website version of this page:
+linkTitle: AI
+path_base_for_github_subdir:
+  from: content/en/docs/specs/semconv/ai/_index.md
+  to: database/README.md
+--->
+
+# Semantic Conventions for AI systems
+
+**Status**: [Experimental][DocumentStatus]
+
+This document defines semantic conventions for the following kind of AI systems:
+
+* LLMs
+
+Semantic conventions for LLM operations are defined for the following signals:
+
+* [LLM Spans](llm-spans.md): Semantic Conventions for LLM requests - *spans*.
+
+[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
diff --git a/docs/ai/llm-spans.md b/docs/ai/llm-spans.md
@@ -0,0 +1,83 @@
+<!--- Hugo front matter used to generate the website version of this page:
+linkTitle: LLM Calls
+--->
+
+# Semantic Conventions for LLM requests
+
+**Status**: [Experimental][DocumentStatus]
+
+<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->
+
+<!-- toc -->
+
+- [Configuration](#configuration)
+- [LLM Request attributes](#llm-request-attributes)
+- [Events](#events)
+
+<!-- tocstop -->
+
+A request to an LLM is modeled as a span in a trace.
+
+The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
+It MAY be a name of the API endpoint for the LLM being called.
+
+## Configuration
+
+Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:
+
+1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend.
+2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
+3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application.
+
+By default, these configurations SHOULD NOT capture prompts and completions.
+
+## LLM Request attributes
+
+These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs.
+
+<!-- semconv gen_ai.llm.request -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| [`gen_ai.llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
+| [`gen_ai.llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required |
+| [`gen_ai.llm.request.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended |
+| [`gen_ai.llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
+| [`gen_ai.llm.response.finish_reason`](../attributes-registry/llm.md) | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `[['stop']]` | Recommended |
+| [`gen_ai.llm.response.id`](../attributes-registry/llm.md) | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
+| [`gen_ai.llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required |
+| [`gen_ai.llm.system`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [3] | `openai` | Recommended |
+| [`gen_ai.llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
+| [`gen_ai.llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
+
+**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.
+
+**[2]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.
+
+**[3]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank.
+<!-- endsemconv -->
+
+## Events
+
+In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.
+
+<!-- semconv gen_ai.llm.content.prompt -->
+The event name MUST be `gen_ai.llm.content.prompt`.
+
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| [`gen_ai.llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |
+
+**[1]:** The full prompt sent to an LLM in a request, structured as a JSON in OpenAI's format.
+<!-- endsemconv -->
+
+<!-- semconv gen_ai.llm.content.completion -->
+The event name MUST be `gen_ai.llm.content.completion`.
+
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| [`gen_ai.llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended |
+
+**[1]:** The full response from an LLM, structured as a JSON in OpenAI's format.
+<!-- endsemconv -->
+
+[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
@@ -0,0 +1,52 @@
+<!--- Hugo front matter used to generate the website version of this page:
+--->
+
+# Large Language Model (LLM)
+
+<!-- toc -->
+
+- [Generic LLM Attributes](#generic-llm-attributes)
+  * [Request Attributes](#request-attributes)
+  * [Response Attributes](#response-attributes)
+  * [Event Attributes](#event-attributes)
+- [OpenAI Attributes](#openai-attributes)
+  * [Request Attributes](#request-attributes-1)
+  * [Response Attributes](#response-attributes-1)
+  * [Event Attributes](#event-attributes-1)
+
+<!-- tocstop -->
+
+## Generic LLM Attributes
+
+### Request Attributes
+
+<!-- semconv registry.llm(omit_requirement_level,tag=llm-generic-request) -->
+| Attribute  | Type | Description  | Examples  |
+|---|---|---|---|
+| `gen_ai.llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` |
+| `gen_ai.llm.request.model` | string | The name of the LLM a request is being made to. | `gpt-4` |
+| `gen_ai.llm.request.temperature` | double | The temperature setting for the LLM request. | `0.0` |
+| `gen_ai.llm.request.top_p` | double | The top_p sampling setting for the LLM request. | `1.0` |
+| `gen_ai.llm.system` | string | The name of the LLM foundation model vendor, if applicable. | `openai` |
+<!-- endsemconv -->
+
+### Response Attributes
+
+<!-- semconv registry.llm(omit_requirement_level,tag=llm-generic-response) -->
+| Attribute  | Type | Description  | Examples  |
+|---|---|---|---|
+| `gen_ai.llm.response.finish_reason` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `[['stop']]` |
+| `gen_ai.llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` |
+| `gen_ai.llm.response.model` | string | The name of the LLM a response is being made to. | `gpt-4-0613` |
+| `gen_ai.llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` |
+| `gen_ai.llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` |
+<!-- endsemconv -->
+
+### Event Attributes
+
+<!-- semconv registry.llm(omit_requirement_level,tag=llm-generic-events) -->
+| Attribute  | Type | Description  | Examples  |
+|---|---|---|---|
+| `gen_ai.llm.completion` | string | The full response string from an LLM in a response. | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` |
+| `gen_ai.llm.prompt` | string | The full prompt string sent to an LLM in a request. | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` |
+<!-- endsemconv -->
diff --git a/model/registry/llm.yaml b/model/registry/llm.yaml
@@ -0,0 +1,67 @@
+groups:
+  - id: registry.llm
+    prefix: gen_ai.llm
+    type: attribute_group
+    brief: >
+      This document defines the attributes used to describe telemetry in the context of LLM (Large Language Models) requests and responses.
+    attributes:
+      - id: system
+        type: string
+        brief: The name of the LLM foundation model vendor, if applicable.
+        examples: 'openai'
+        tag: llm-generic-request
+      - id: request.model
+        type: string
+        brief: The name of the LLM a request is being made to.
+        examples: 'gpt-4'
+        tag: llm-generic-request
+      - id: request.max_tokens
+        type: int
+        brief: The maximum number of tokens the LLM generates for a request.
+        examples: [100]
+        tag: llm-generic-request
+      - id: request.temperature
+        type: double
+        brief: The temperature setting for the LLM request.
+        examples: [0.0]
+        tag: llm-generic-request
+      - id: request.top_p
+        type: double
+        brief: The top_p sampling setting for the LLM request.
+        examples: [1.0]
+        tag: llm-generic-request
+      - id: response.id
+        type: string
+        brief: The unique identifier for the completion.
+        examples: ['chatcmpl-123']
+        tag: llm-generic-response
+      - id: response.model
+        type: string
+        brief: The name of the LLM a response is being made to.
+        examples: ['gpt-4-0613']
+        tag: llm-generic-response
+      - id: response.finish_reason
+        type: string[]
+        brief: Array of reasons the model stopped generating tokens, corresponding to each generation received.
+        examples: [['stop']]
+        tag: llm-generic-response
+      - id: usage.prompt_tokens
+        type: int
+        brief: The number of tokens used in the LLM prompt.
+        examples: [100]
+        tag: llm-generic-response
+      - id: usage.completion_tokens
+        type: int
+        brief: The number of tokens used in the LLM response (completion).
+        examples: [180]
+        tag: llm-generic-response
+      - id: prompt
+        type: string
+        brief: The full prompt string sent to an LLM in a request.
+        examples: ['\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:']
+        tag: llm-generic-events
+      - id: completion
+        type: string
+        brief: The full response string from an LLM in a response.
+        examples: ['Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!']
+        tag: llm-generic-events
diff --git a/model/trace/llm.yaml b/model/trace/llm.yaml
@@ -0,0 +1,63 @@
+groups:
+  - id: gen_ai.llm.request
+    type: span
+    brief: >
+      A request to an LLM is modeled as a span in a trace. The span name should be a low cardinality value representing the request made to an LLM, like the name of the API endpoint being called.
+    attributes:
+      - ref: gen_ai.llm.system
+        requirement_level: recommended
+        note: >
+          The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank.
+      - ref: gen_ai.llm.request.model
+        requirement_level: required
+        note: >
+            The name of the LLM a request is being made to. If the LLM is supplied by a vendor,
+            then the value must be the exact name of the model requested. If the LLM is a fine-tuned
+            custom model, the value should have a more specific name than the base model that's been fine-tuned.
+      - ref: gen_ai.llm.request.max_tokens
+        requirement_level: recommended
+      - ref: gen_ai.llm.request.temperature
+        requirement_level: recommended
+      - ref: gen_ai.llm.request.top_p
+        requirement_level: recommended
+      - ref: gen_ai.llm.response.id
+        requirement_level: recommended
+      - ref: gen_ai.llm.response.model
+        requirement_level: required
+        note: >
+          The name of the LLM a response is being made to. If the LLM is supplied by a vendor,
+          then the value must be the exact name of the model actually used. If the LLM is a
+          fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.
+      - ref: gen_ai.llm.response.finish_reason
+        requirement_level: recommended
+      - ref: gen_ai.llm.usage.prompt_tokens
+        requirement_level: recommended
+      - ref: gen_ai.llm.usage.completion_tokens
+        requirement_level: recommended
+    events:
+      - gen_ai.llm.content.prompt
+      - gen_ai.llm.content.completion
+
+  - id: gen_ai.llm.content.prompt
+    name: gen_ai.llm.content.prompt
+    type: event
+    brief: >
+      In the lifetime of an LLM span, events for prompts sent and completions received
+      may be created, depending on the configuration of the instrumentation.
+    attributes:
+      - ref: gen_ai.llm.prompt
+        requirement_level: recommended
+        note: >
+          The full prompt sent to an LLM in a request, structured as a JSON in OpenAI's format.
+
+  - id: gen_ai.llm.content.completion
+    name: gen_ai.llm.content.completion
+    type: event
+    brief: >
+      In the lifetime of an LLM span, events for prompts sent and completions received 
+      may be created, depending on the configuration of the instrumentation.
+    attributes:
+      - ref: gen_ai.llm.completion
+        requirement_level: recommended
+        note: >
+          The full response from an LLM, structured as a JSON in OpenAI's format.