-
Notifications
You must be signed in to change notification settings - Fork 0
add docs for custom eval metric ATH-850 #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update enhances logging capabilities by allowing users to include custom evaluation metrics alongside their prompts. The focus is on enabling unique identification and independent analysis of these metrics. Changes
TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- pages/logging/log_via_api.mdx (1 hunks)
Additional comments: 3
pages/logging/log_via_api.mdx (3)
- 117-133: The addition of documentation for logging custom evaluation metrics is clear and well-structured. It provides straightforward instructions on how to log custom metrics alongside prompts, emphasizing the importance of using unique metric names to avoid conflicts. This aligns well with the PR objectives and enhances the documentation's utility for users needing to track custom metrics.
- 114-137: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]
The static analysis tool suggested capitalizing the word "import" at the beginning of the file. However, this is a false positive since the syntax is correct for JavaScript/JSX imports. No action is needed here.
- 114-137: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [6-6]
The static analysis tool suggested replacing "Athina" with "Athena." However, "Athina" is the correct name in this context, so no change is necessary.
} | ||
``` | ||
|
||
##### Custom Eval Metrics | ||
|
||
Optionally, you can also log custom eval metrics with your prompt. You can pass metric name and metric value as key-value pair in the `custom_eval_metrics` object. | ||
|
||
Note:- A prompt run cannot have duplicate eval metric names | ||
|
||
```json | ||
{ | ||
// ...otherFields, | ||
"custom_eval_metrics": { | ||
"throughput": 100, | ||
"correctness": 0.5, | ||
"latency": 100, | ||
//any other eval metric to be logged | ||
} | ||
} | ||
``` | ||
|
||
##### Grounded Evals | ||
|
||
For grounded evals like [Answer Similarity](/evals/preset_evals/grounded_evals#answer_similarity), you must also log a reference response (string) to compare against: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [91-91]
Consider adding a period after "etc" to adhere to American English conventions for abbreviations.
- etc
+ etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (6)
- pages/logging/langchain.mdx (2 hunks)
- pages/logging/log_via_python_sdk.mdx (8 hunks)
- pages/logging/openai_chat_0.mdx (3 hunks)
- pages/logging/openai_chat_1.mdx (3 hunks)
- pages/logging/openai_completion_0.mdx (2 hunks)
- pages/logging/openai_completion_1.mdx (2 hunks)
Additional comments: 14
pages/logging/langchain.mdx (2)
- 35-35: The change in formatting for
custom_attributes
improves readability and consistency with JSON object notation. However, ensure that thetrue
andnull
values are correctly interpreted in the Python context, as Python usesTrue
andNone
.- 58-65: Renaming
context1
todocument1
andcontext2
todocument2
enhances clarity by better describing the nature of the data being logged. This change aligns with the objective of making the documentation more intuitive for users.pages/logging/openai_completion_0.mdx (2)
- 61-64: Adding the
custom_eval_metrics
field with an example of"automation_rate": 0.5
is a valuable enhancement. It allows users to log custom evaluation metrics, providing more flexibility and detailed analysis capabilities. Ensure that the example provided is clear and demonstrates the intended use effectively.- 120-123: The repetition of the
custom_eval_metrics
addition in a different context (OpenAI API call and SSE) ensures consistency across different logging methods. This repetition is beneficial for users who might use different approaches for logging.pages/logging/openai_completion_1.mdx (2)
- 61-64: The inclusion of
custom_eval_metrics
with"automation_rate": 0.5
in this file mirrors the enhancement made in the previous file, maintaining consistency across documentation. This addition is crucial for users interested in logging custom evaluation metrics.- 120-123: Repeating the addition of
custom_eval_metrics
in a different logging context ensures that users are aware of this capability regardless of the method they choose for logging. This consistency in documentation is appreciated.pages/logging/openai_chat_1.mdx (3)
- 60-63: The introduction of the
custom_eval_metrics
field with an example of logging"automation_rate": 0.5
is a significant enhancement for users logging OpenAI chat completions. This addition allows for more detailed performance analysis and customization.- 153-156: Ensuring that the
custom_eval_metrics
field is also demonstrated in the context of streaming responses highlights the flexibility of this new feature across different logging scenarios. This consistency helps users understand how to apply custom metrics in various contexts.- 90-90: The declaration of
custom_eval_metrics: Optional[dict] = None
in theAthinaMeta
class is a crucial update, ensuring that users can easily integrate custom evaluation metrics into their logging setup. This change should be clearly documented to guide users on how to utilize this new field effectively.pages/logging/openai_chat_0.mdx (2)
- 60-63: The addition of
custom_eval_metrics
within theopenai.ChatCompletion.create
call is a significant enhancement, allowing users to log custom evaluation metrics alongside their prompts. This aligns well with the PR's objective of providing users with more detailed tracking and analysis capabilities. However, ensure that the documentation clearly explains the expected format and possible values forcustom_eval_metrics
to guide users effectively.- 90-90: Adding
custom_eval_metrics
as an optional field in theAthinaMeta
class is a good practice, as it extends the flexibility for logging custom evaluation metrics. It's important to ensure that examples or guidelines on how to use this field effectively are provided in the documentation to assist users in leveraging this new feature to its full potential.pages/logging/log_via_python_sdk.mdx (3)
- 53-80: The updated
InferenceLogger.log_inference()
method now includes a comprehensive set of parameters, enhancing the logging capabilities significantly. This aligns with the PR's objective of providing detailed tracking and analysis options. It's crucial to ensure that each parameter is well-documented, explaining its purpose, expected format, and how it contributes to the logging process. This will help users understand and utilize the new features effectively.- 99-126: The repetition of the
InferenceLogger.log_inference()
method call for different versions of the OpenAI SDK is a good practice, ensuring compatibility across versions. However, it's essential to highlight the differences and recommend the appropriate version for specific use cases in the documentation. This clarity will aid users in choosing the right approach for their needs.- 243-290: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [147-287]
The detailed explanation of all arguments for the
InferenceLogger.log_inference()
method is valuable for users to understand the full capabilities of the logging function. Ensure that each argument is accompanied by examples or use cases where applicable. This will provide users with practical insights into how they can leverage these arguments to enhance their logging practices.
completion_tokens: 30 | ||
total_tokens: 80 | ||
response_time: 1208 | ||
prompt_tokens=50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer adding it like this
{
//... other fields in logging
"key1":"value1",
"key2":"value2"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (added one nit: though)
Summary by CodeRabbit
custom_eval_metrics
field to the OpenAI chat completions setup in Python, allowing for custom evaluation metrics alongside completion requests.custom_eval_metrics
field with the key-value pair"automation_rate": 0.5"
for logging OpenAI chat streams in Python.