add docs for custom eval metric ATH-850 #26

akshat-g · 2024-03-15T05:54:50Z

Summary by CodeRabbit

New Features
- Enhanced logging capabilities by restructuring method calls and adding new parameters for more customizable inference tracking.
- Added a new custom_eval_metrics field to the OpenAI chat completions setup in Python, allowing for custom evaluation metrics alongside completion requests.
- Introduced a new custom_eval_metrics field with the key-value pair "automation_rate": 0.5" for logging OpenAI chat streams in Python.

linear · 2024-03-15T05:54:52Z

ATH-850 Add Docs to log custom eval metrics while logging inference

coderabbitai · 2024-03-15T05:54:54Z

Walkthrough

This update enhances logging capabilities by allowing users to include custom evaluation metrics alongside their prompts. The focus is on enabling unique identification and independent analysis of these metrics.

Changes

File Path	Change Summary
`.../logging`	Various files in the `logging` directory have been updated to support logging of custom evaluation metrics alongside prompts.

🐇✨
In the realm of code, a new feature hops,
Where metrics and prompts in harmony drops.
Unique names they bear, in logs they flair,
A rabbit's craft, so deft, so rare.
Celebrate we do, in lines of code and rhyme,
For progress leaps, one hop at a time. 🌟📈

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

vercel · 2024-03-15T05:54:55Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
magik-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 15, 2024 7:05pm

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 0fb018a and d1b04c0.

Files selected for processing (1)

pages/logging/log_via_api.mdx (1 hunks)

Additional comments: 3

pages/logging/log_via_api.mdx (3)

117-133: The addition of documentation for logging custom evaluation metrics is clear and well-structured. It provides straightforward instructions on how to log custom metrics alongside prompts, emphasizing the importance of using unique metric names to avoid conflicts. This aligns well with the PR objectives and enhances the documentation's utility for users needing to track custom metrics.

114-137: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]

The static analysis tool suggested capitalizing the word "import" at the beginning of the file. However, this is a false positive since the syntax is correct for JavaScript/JSX imports. No action is needed here.

114-137: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [6-6]

The static analysis tool suggested replacing "Athina" with "Athena." However, "Athina" is the correct name in this context, so no change is necessary.

coderabbitai · 2024-03-15T05:56:04Z

pages/logging/log_via_api.mdx

 }
 ```

+##### Custom Eval Metrics
+
+Optionally, you can also log custom eval metrics with your prompt. You can pass metric name and metric value as key-value pair in the `custom_eval_metrics` object.
+
+Note:- A prompt run cannot have duplicate eval metric names
+
+```json
+{
+  // ...otherFields,
+  "custom_eval_metrics": {
+    "throughput": 100,
+    "correctness": 0.5,
+    "latency": 100,
+    //any other eval metric to be logged
+  }
+}
+```
+
 ##### Grounded Evals

 For grounded evals like [Answer Similarity](/evals/preset_evals/grounded_evals#answer_similarity), you must also log a reference response (string) to compare against:


📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [91-91]

Consider adding a period after "etc" to adhere to American English conventions for abbreviations.

- etc + etc.

… docs

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between d1b04c0 and 574c97b.

Files selected for processing (6)

pages/logging/langchain.mdx (2 hunks)
pages/logging/log_via_python_sdk.mdx (8 hunks)
pages/logging/openai_chat_0.mdx (3 hunks)
pages/logging/openai_chat_1.mdx (3 hunks)
pages/logging/openai_completion_0.mdx (2 hunks)
pages/logging/openai_completion_1.mdx (2 hunks)

Additional comments: 14

pages/logging/langchain.mdx (2)

35-35: The change in formatting for custom_attributes improves readability and consistency with JSON object notation. However, ensure that the true and null values are correctly interpreted in the Python context, as Python uses True and None.

58-65: Renaming context1 to document1 and context2 to document2 enhances clarity by better describing the nature of the data being logged. This change aligns with the objective of making the documentation more intuitive for users.

pages/logging/openai_completion_0.mdx (2)

61-64: Adding the custom_eval_metrics field with an example of "automation_rate": 0.5 is a valuable enhancement. It allows users to log custom evaluation metrics, providing more flexibility and detailed analysis capabilities. Ensure that the example provided is clear and demonstrates the intended use effectively.

120-123: The repetition of the custom_eval_metrics addition in a different context (OpenAI API call and SSE) ensures consistency across different logging methods. This repetition is beneficial for users who might use different approaches for logging.

pages/logging/openai_completion_1.mdx (2)

61-64: The inclusion of custom_eval_metrics with "automation_rate": 0.5 in this file mirrors the enhancement made in the previous file, maintaining consistency across documentation. This addition is crucial for users interested in logging custom evaluation metrics.

120-123: Repeating the addition of custom_eval_metrics in a different logging context ensures that users are aware of this capability regardless of the method they choose for logging. This consistency in documentation is appreciated.

pages/logging/openai_chat_1.mdx (3)

60-63: The introduction of the custom_eval_metrics field with an example of logging "automation_rate": 0.5 is a significant enhancement for users logging OpenAI chat completions. This addition allows for more detailed performance analysis and customization.

153-156: Ensuring that the custom_eval_metrics field is also demonstrated in the context of streaming responses highlights the flexibility of this new feature across different logging scenarios. This consistency helps users understand how to apply custom metrics in various contexts.

90-90: The declaration of custom_eval_metrics: Optional[dict] = None in the AthinaMeta class is a crucial update, ensuring that users can easily integrate custom evaluation metrics into their logging setup. This change should be clearly documented to guide users on how to utilize this new field effectively.

pages/logging/openai_chat_0.mdx (2)

60-63: The addition of custom_eval_metrics within the openai.ChatCompletion.create call is a significant enhancement, allowing users to log custom evaluation metrics alongside their prompts. This aligns well with the PR's objective of providing users with more detailed tracking and analysis capabilities. However, ensure that the documentation clearly explains the expected format and possible values for custom_eval_metrics to guide users effectively.

90-90: Adding custom_eval_metrics as an optional field in the AthinaMeta class is a good practice, as it extends the flexibility for logging custom evaluation metrics. It's important to ensure that examples or guidelines on how to use this field effectively are provided in the documentation to assist users in leveraging this new feature to its full potential.

pages/logging/log_via_python_sdk.mdx (3)

53-80: The updated InferenceLogger.log_inference() method now includes a comprehensive set of parameters, enhancing the logging capabilities significantly. This aligns with the PR's objective of providing detailed tracking and analysis options. It's crucial to ensure that each parameter is well-documented, explaining its purpose, expected format, and how it contributes to the logging process. This will help users understand and utilize the new features effectively.

99-126: The repetition of the InferenceLogger.log_inference() method call for different versions of the OpenAI SDK is a good practice, ensuring compatibility across versions. However, it's essential to highlight the differences and recommend the appropriate version for specific use cases in the documentation. This clarity will aid users in choosing the right approach for their needs.

243-290: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [147-287]

The detailed explanation of all arguments for the InferenceLogger.log_inference() method is valuable for users to understand the full capabilities of the logging function. Ensure that each argument is accompanied by examples or use cases where applicable. This will provide users with practical insights into how they can leverage these arguments to enhance their logging practices.

vivek-athina · 2024-03-16T06:56:54Z

pages/logging/log_via_python_sdk.mdx

-completion_tokens: 30
-total_tokens: 80
-response_time: 1208
+prompt_tokens=50


I prefer adding it like this

{ //... other fields in logging "key1":"value1", "key2":"value2" }

vivek-athina

LGTM (added one nit: though)

add docs for custom eval metric

d1b04c0

akshat-g requested a review from vivek-athina March 15, 2024 05:54

vercel bot deployed to Preview March 15, 2024 05:55 View deployment

coderabbitai bot reviewed Mar 15, 2024

View reviewed changes

added docs for custom eval metrics in sdk, modified inference logging…

574c97b

… docs

vercel bot deployed to Preview March 15, 2024 19:05 View deployment

coderabbitai bot reviewed Mar 15, 2024

View reviewed changes

vivek-athina reviewed Mar 16, 2024

View reviewed changes

vivek-athina approved these changes Mar 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add docs for custom eval metric ATH-850 #26

add docs for custom eval metric ATH-850 #26

Uh oh!

akshat-g commented Mar 15, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

linear bot commented Mar 15, 2024

Uh oh!

coderabbitai bot commented Mar 15, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

Uh oh!

vercel bot commented Mar 15, 2024 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 15, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

vivek-athina Mar 16, 2024 •

edited

Loading

Uh oh!

vivek-athina left a comment

Uh oh!

Uh oh!

add docs for custom eval metric ATH-850 #26

Are you sure you want to change the base?

add docs for custom eval metric ATH-850 #26

Uh oh!

Conversation

akshat-g commented Mar 15, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear bot commented Mar 15, 2024

Uh oh!

coderabbitai bot commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

Uh oh!

vercel bot commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

vivek-athina Mar 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vivek-athina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akshat-g commented Mar 15, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 15, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

vercel bot commented Mar 15, 2024 •

edited

Loading

vivek-athina Mar 16, 2024 •

edited

Loading