Skip to content

Commit 574c97b

Browse files
committed
added docs for custom eval metrics in sdk, modified inference logging docs
1 parent d1b04c0 commit 574c97b

File tree

6 files changed

+126
-57
lines changed

6 files changed

+126
-57
lines changed

pages/logging/langchain.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ athina_handler = CallbackHandler(
3232
customer_id='nike-usa',
3333
customer_user_id='tim@apple.com',
3434
external_reference_id='your-reference-id',
35-
custom_attributes= {
35+
custom_attributes={
3636
"loggedBy": "John Doe",
3737
"age": 24,
3838
"isAdmin": true,
@@ -55,14 +55,14 @@ athina_handler = CallbackHandler(
5555

5656
```json
5757
Sample kwargs:
58-
context1 = "Germany is located in central europe"
59-
context2 = "Berlin is the capital of Germany"
58+
document1="Germany is located in central europe"
59+
document2="Berlin is the capital of Germany"
6060

6161
This will be stored as:
6262

6363
{
64-
"context1": "Germany is located in central europe",
65-
"context2": "Berlin is the capital of Germany"
64+
"document1": "Germany is located in central europe",
65+
"document2": "Berlin is the capital of Germany"
6666
}
6767

6868
This will be perceived as retrieved context

pages/logging/log_via_python_sdk.mdx

Lines changed: 87 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -50,16 +50,34 @@ response = openai.ChatCompletion.create(
5050

5151
try:
5252
InferenceLogger.log_inference(
53-
prompt_slug="sdk_test",
54-
prompt=messages,
55-
language_model_id="gpt-4-1106-preview",
56-
response=response,
57-
cost=cost,
58-
external_reference_id="abc",
59-
custom_attributes={
60-
"name": "John Doe"
61-
# Your custom attributes
62-
}
53+
prompt: [{"role": "user", "content": "What is machine learning?"}],
54+
response: "Machine Learning is a branch of computer science",
55+
prompt_slug: "test",
56+
language_model_id: "gpt-3.5-turbo",
57+
environment: "production",
58+
external_reference_id: "5e838eaf-7dd0-4b6f-a32c-26110dd54e58",
59+
customer_id: "stripe",
60+
customer_user_id: "abc@athina.ai",
61+
session_id: "session_1",
62+
user_query: "what is machine learning?",
63+
prompt_tokens: 10,
64+
completion_tokens: 20,
65+
total_tokens: 30,
66+
response_time: 200,
67+
context: {
68+
"document": "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and
69+
study of statistical algorithms that can learn from data and generalize to unseen data, and
70+
thus perform tasks without explicit instructions"
71+
},
72+
expected_response: "Machine leaning is a branch of computer science that explores the study and construction of
73+
algorithms which can learn and make predictions on data.",
74+
custom_attributes: {
75+
"tag": "science"
76+
},
77+
custom_eval_metrics: {
78+
"automation_rate": 0.5
79+
},
80+
cost: 0.01,
6381
)
6482
except Exception as e:
6583
if isinstance(e, CustomException):
@@ -78,16 +96,34 @@ response = response.model_dump() # For openai > 1 version
7896

7997
try:
8098
InferenceLogger.log_inference(
81-
prompt_slug="sdk_test",
82-
prompt=messages,
83-
language_model_id="gpt-4-1106-preview",
84-
response=response,
85-
external_reference_id="abc",
86-
cost=0.0123,
87-
custom_attributes={
88-
"name": "John Doe"
89-
# Your custom attributes
90-
}
99+
prompt: [{"role": "user", "content": "What is machine learning?"}],
100+
response: "Machine Learning is a branch of computer science",
101+
prompt_slug: "test",
102+
language_model_id: "gpt-3.5-turbo",
103+
environment: "production",
104+
external_reference_id: "5e838eaf-7dd0-4b6f-a32c-26110dd54e58",
105+
customer_id: "stripe",
106+
customer_user_id: "abc@athina.ai",
107+
session_id: "session_1",
108+
user_query: "what is machine learning?",
109+
prompt_tokens: 10,
110+
completion_tokens: 20,
111+
total_tokens: 30,
112+
response_time: 200,
113+
context: {
114+
"document": "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and
115+
study of statistical algorithms that can learn from data and generalize to unseen data, and
116+
thus perform tasks without explicit instructions"
117+
},
118+
expected_response: "Machine leaning is a branch of computer science that explores the study and construction of
119+
algorithms which can learn and make predictions on data.",
120+
custom_attributes: {
121+
"tag": "science"
122+
},
123+
custom_eval_metrics: {
124+
"automation_rate": 0.5
125+
},
126+
cost: 0.01,
91127
)
92128
except Exception as e:
93129
if isinstance(e, CustomException):
@@ -108,22 +144,22 @@ All the arguments for the InferenceLogger.log_inference() method are:
108144
```python
109145
Expected formats of prompt:
110146

111-
prompt: [{"role": "user", "content": "What is machine learning?"}] # for openai models
112-
prompt: {"text": "What is maching learning?"} # for other models
113-
prompt: "what is machine learning?" # for other models
147+
prompt=[{"role": "user", "content": "What is machine learning?"}] # for openai models
148+
prompt={"text": "What is maching learning?"} # for other models
149+
prompt="what is machine learning?" # for other models
114150
```
115151
- `response (optional)`: LLM Response. This can be either a `string` or the `ChatCompletion` response object from OpenAI
116152
- `prompt_slug (optional)`: Identifier for the prompt used for inference. This is useful for segmenting inference calls by prompt
117153
```python
118-
prompt_slug: "customer_query"
154+
prompt_slug="customer_query"
119155
```
120156
- `language_model_id (optional)`: Language model against which inference is made. Check out all supported models [here](/logging/supported_models)
121157
```python
122-
language_model_id: "gpt-4-1106-preview"
158+
language_model_id="gpt-4-1106-preview"
123159
```
124160
- `functions (optional)`: functions for older versions of openai,
125161
```python
126-
functions: [
162+
functions=[
127163
{
128164
"name": "get_current_weather",
129165
"description": "Get the current weather in a given location",
@@ -151,18 +187,18 @@ functions: [
151187
```
152188
- `environment (optional)`: Environment your app is running in (ex: production, staging, etc). This is useful for segmenting inference calls by environment
153189
```python
154-
environment: "production"
190+
environment="production"
155191
```
156192
- `function_call_response (optional)`: function call for older version of openai
157193
```python
158-
function_call_response: {
194+
function_call_response={
159195
"name": "get_current_weather",
160196
"arguments": "{\n \"location\": \"Boston, MA\"\n}"
161197
}
162198
```
163199
- `tools (optional)`: tools for new versions of openai
164200
```python
165-
tools: [
201+
tools=[
166202
{
167203
"type": "function",
168204
"function": {
@@ -193,7 +229,7 @@ tools: [
193229
```
194230
- `tool_calls (optional)`: tool calls for new versions of openai
195231
```python
196-
tool_calls: [
232+
tool_calls=[
197233
{
198234
"id": "call_abc123",
199235
"type": "function",
@@ -207,41 +243,48 @@ tool_calls: [
207243
If tool_calls field is not present, we extract it from the openai completion response and log it in our database
208244
- `external_reference_id (optional)`: is useful if you want to associate your own internal identifier with the inference logged to Athina
209245
```python
210-
external_reference_id: "5e838eaf-7dd0-4b6f-a32c-26110dd54e58"
246+
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58"
211247
```
212248
- `customer_id (optional)`: is your customer ID. This is useful for segmenting inference calls by customer
213249
```python
214-
customer_id: "stripe"
250+
customer_id="stripe"
215251
```
216252
- `customer_user_id (optional)`: is the end user ID. This is useful for segmenting inference calls by the end user
217253
```python
218-
customer_user_id: "user@gmail.com"
254+
customer_user_id="user@gmail.com"
219255
```
220256
- `cost (optional)`: is the cost incurred for this LLM inference call. Tip: If you log an entire OpenAI completion response to us, we'll automatically calculate the cost.
221257
```python
222-
cost: 0.0123
258+
cost=0.0123
223259
```
224260
- `session_id (optional)`: is the session or conversation ID. This is used for grouping different inferences into a conversation or chain. [Read more](/logging/grouping_inferences)
225261
```python
226-
session_id: "c45g-1234-s6g4-43d3"
262+
session_id="c45g-1234-s6g4-43d3"
227263
```
228264
- `user_query (optional)`: is the user's query. For conversational applications, this is the user's last message
229265
```python
230-
user_query: "what is machine learning?"
266+
user_query="what is machine learning?"
231267
```
232268
- `context (optional)`: is the context used as information for the prompt. For RAG applications, this is the "retrieved" data.
233269
You may log context as a string or as an object (dictionary)
234270
```python
235-
context: {"information": "Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy"}
236-
context: "Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy"
271+
context={"information": "Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy"}
272+
context="Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy"
237273
```
238274
- `custom_attributes (optional)`: custom_attributes is an object (dictionary) where you can log your own custom attributes as key-value pair with the inference.
239275
```python
240-
custom_attributes: {
276+
custom_attributes={
241277
"name": "John Doe"
242278
# Any other custom_attribute
243279
} # OPTIONAL;
244280
```
281+
- `custom_eval_metrics (optional)`: custom_eval_metrics is an object (dictionary) where you can log your own custom eval metrics of the llm response as key-value pair with the inference.
282+
```python
283+
custom_eval_metrics={
284+
"automation_rate": 0.3
285+
# Any other custom_eval_metric
286+
} # OPTIONAL;
287+
```
245288
<Callout>
246289
Tip: For [evals](/evals/preset_evals/rag_evals), you must also log user_query and context
247290
</Callout>
@@ -251,10 +294,10 @@ custom_attributes: {
251294
- `total_tokens (optional)`: prompt_tokens + completion_tokens,
252295
- `response_time (optional)`: is the response time in milliseconds. This is useful for segmenting inference calls by response time
253296
```python
254-
prompt_tokens: 50
255-
completion_tokens: 30
256-
total_tokens: 80
257-
response_time: 1208
297+
prompt_tokens=50
298+
completion_tokens=30
299+
total_tokens=80
300+
response_time=1200
258301
```
259302
<Callout>
260303
Tip: If you log the entire OpenAI `ChatCompletion` response object to us,
@@ -263,7 +306,7 @@ response_time: 1208
263306

264307
- `expected_response (optional)`: is the reference response to compare against for evaluation purposes. This is useful for segmenting inference calls by expected response
265308
```python
266-
expected_response: "Machine Learning is a branch of artificial intelligence"
309+
expected_response="Machine Learning is a branch of computer science"
267310
```
268311
<Callout>
269312
Tip: For grounded evals like [Answer Similarity](/evals/preset_evals/grounded_evals#answer_similarity), you must also log a reference response (string) to compare against.

pages/logging/openai_chat_0.mdx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,10 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
5757
"name": "John",
5858
"age": 30,
5959
"city": "New York"
60-
} # Your custom-attributes
60+
}, # Your custom attributes
61+
custom_eval_metrics={
62+
"automation_rate": 0.5
63+
} # Your custom eval metrics
6164
),
6265
)
6366
```
@@ -84,6 +87,7 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
8487
customer_user_id: Optional[str] = None
8588
response_time: Optional[int] = None
8689
custom_attributes: Optional[dict] = None
90+
custom_eval_metrics: Optional[dict] = None
8791
```
8892

8993

@@ -145,7 +149,10 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
145149
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
146150
custom_attributes={
147151
"name": "John Doe"
148-
} # OPTIONAL;
152+
}, # OPTIONAL
153+
custom_eval_metrics={
154+
"automation_rate": 0.5
155+
} # OPTIONAL
149156
)
150157
```
151158

pages/logging/openai_chat_1.mdx

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,10 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
5757
"name": "John",
5858
"age": 30,
5959
"city": "New York"
60-
} # Your custom-attributes
60+
}, # Your custom attributes
61+
custom_eval_metrics={
62+
"automation_rate": 0.5
63+
} # Your custom eval metrics
6164
),
6265
)
6366
```
@@ -84,6 +87,7 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
8487
customer_user_id: Optional[str] = None
8588
response_time: Optional[int] = None
8689
custom_attributes: Optional[dict] = None
90+
custom_eval_metrics: Optional[dict] = None
8791
```
8892

8993

@@ -146,7 +150,10 @@ _If you're using OpenAI chat completions in Python, you can get set up in just *
146150
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
147151
custom_attributes={
148152
"name": "John Doe"
149-
} # OPTIONAL;
153+
}, # OPTIONAL
154+
custom_eval_metrics={
155+
"automation_rate": 0.5
156+
} # OPTIONAL
150157
)
151158
```
152159

pages/logging/openai_completion_0.mdx

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,10 @@ _If you're using OpenAI completions in Python, you can get set up in just **2 mi
5858
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
5959
custom_attributes={
6060
"name": "John Doe"
61-
} # OPTIONAL;
61+
}, # OPTIONAL
62+
custom_eval_metrics={
63+
"automation_rate": 0.5
64+
} # OPTIONAL
6265
)
6366

6467
# Here are 2 ways to log openai chat streams
@@ -114,7 +117,10 @@ _If you're using OpenAI completions in Python, you can get set up in just **2 mi
114117
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
115118
custom_attributes={
116119
"name": "John Doe"
117-
} # OPTIONAL;
120+
}, # OPTIONAL;
121+
custom_eval_metrics={
122+
"automation_rate": 0.5
123+
} # OPTIONAL
118124
)
119125
client = sseclient.SSEClient(request)
120126
try:

pages/logging/openai_completion_1.mdx

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,10 @@ _If you're using OpenAI completions in Python, you can get set up in just **2 mi
5858
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
5959
custom_attributes={
6060
"name": "John Doe"
61-
} # OPTIONAL;
61+
}, # OPTIONAL
62+
custom_eval_metrics={
63+
"automation_rate": 0.5
64+
} # OPTIONAL
6265
)
6366

6467
# Here are 2 ways to log openai chat streams
@@ -114,7 +117,10 @@ _If you're using OpenAI completions in Python, you can get set up in just **2 mi
114117
external_reference_id="5e838eaf-7dd0-4b6f-a32c-26110dd54e58", # OPTIONAL; If passed, should be unique across all inference calls
115118
custom_attributes={
116119
"name": "John Doe"
117-
} # OPTIONAL;
120+
}, # OPTIONAL;
121+
custom_eval_metrics={
122+
"automation_rate": 0.5
123+
} # OPTIONAL
118124
)
119125
client = sseclient.SSEClient(request)
120126
try:

0 commit comments

Comments
 (0)