Math Evaluators #3719

ninghu · 2024-09-05T15:43:52Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

All Promptflow Contribution checklist:

The pull request does not introduce [breaking changes].
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.
I confirm that all new dependencies are compatible with the MIT license.
Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: suggested workflow.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

github-actions · 2024-09-05T16:27:13Z

promptflow-evals test result

12 files ± 0 12 suites ±0 1h 48m 8s ⏱️ + 1h 27m 42s
63 tests - 56 60 ✅ - 59 3 💤 + 3 0 ❌ ±0
756 runs - 672 720 ✅ - 708 36 💤 +36 0 ❌ ±0

Results for commit 151c23c. ± Comparison against base commit cc6a6e3.

This pull request removes 119 and adds 63 tests. Note that renamed tests count towards both.

tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_batch_timeout_custom
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_batch_timeout_default
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_with_codeclient
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_with_pfclient
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator_empty_string
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator_non_string_inputs
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_invalid_citations
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_missing_role
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_normal
…

tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[False-True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[True-True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_content_safety
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_content_safety_chat[False-False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_content_safety_chat[True-False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa[False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa[True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa_for_nans
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa_with_openai_config[False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa_with_openai_config[True]
…

♻️ This comment has been updated with latest results.

src/promptflow-evals/promptflow/evals/_common/utils.py

src/promptflow-evals/promptflow/evals/evaluators/_meteor/_meteor.py

ninghu added 3 commits August 23, 2024 09:44

add initial mock implementation

ed37749

adding the implementations

0c5f761

Merge branch 'main' into users/ninhu/math_based_evaluators

bcd5294

github-actions bot added the promptflow-evals label Sep 5, 2024

ninghu added 2 commits September 5, 2024 08:46

revert unneeded changes

f464021

fix lint issue

d475f15

ninghu added 3 commits September 5, 2024 09:33

fix the docstring and unittest

38ef087

skip the failed tests with ADO created

e94e3e8

revert precommit change

60f7405

ninghu marked this pull request as ready for review September 6, 2024 16:19

ninghu requested review from a team as code owners September 6, 2024 16:20

MilesHolland reviewed Sep 6, 2024

View reviewed changes

src/promptflow-evals/promptflow/evals/_common/utils.py Show resolved Hide resolved

luigiw approved these changes Sep 6, 2024

View reviewed changes

Merge branch 'main' into users/ninhu/math_based_evaluators

151c23c

singankit reviewed Sep 6, 2024

View reviewed changes

src/promptflow-evals/promptflow/evals/_common/utils.py Show resolved Hide resolved

singankit reviewed Sep 6, 2024

View reviewed changes

src/promptflow-evals/promptflow/evals/evaluators/_meteor/_meteor.py Show resolved Hide resolved

singankit approved these changes Sep 6, 2024

View reviewed changes

ninghu merged commit 1853dbe into main Sep 6, 2024
76 of 77 checks passed

ninghu deleted the users/ninhu/math_based_evaluators branch September 6, 2024 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math Evaluators #3719

Math Evaluators #3719

ninghu commented Sep 5, 2024

github-actions bot commented Sep 5, 2024 •

edited

Loading

Math Evaluators #3719

Math Evaluators #3719

Conversation

ninghu commented Sep 5, 2024

Description

All Promptflow Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

github-actions bot commented Sep 5, 2024 • edited Loading

promptflow-evals test result

github-actions bot commented Sep 5, 2024 •

edited

Loading