Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[evaluation] ci: Enable mypy (#37615)
* fix(typing): Resolve mypy violations in azure/ai/evaluation/_http_utils.py * fix(typing): Resolve uses of implicit Optional in type annotations * fix(typing): Resolve type reassignment in http_utils.py * style: Run isort * fix(typing): Fix attempted type reassignment in _f1_score.py * fix(typing): Use a TypeGuard to allow mypy to narrow types in _common/utils.py * fix(typing): Correct return type of get_harm_severity_level * fix(typing): Correct return type of _compute_f1_score * fix(typing): Ensure mypy knows that AsyncHttpPipeline.__enter__ returns Self * fix(typing): Allow mypy to infer the types of the convenience request methods _http_utils.py extensively uses decorators to implement the "convenience" request methods (get, post, put, etc...) for {Async}HttpPipeline, since they all share a common underlying implementation. However neither decorator annotated its return type (the type of the decorated function). Initially this was because the accurate type couldn't be spelled using a `Callable`, and pylance still did a fine job providing intellisense. It turns out that it currently isn't possible to spell it with a callable typing.Protocol. Our decorator applies to a method, and mypy struggles with the removal of the `self` attribute that occurs when a method binds to an object (see python/mypy issue #16200). This commit resolves this by making the implementation of the http pipelines more verbose, removing the decorators and unrolling the implementation to each convenience method. Using `Unpack[TypeDict]` to annotate kwargs makes this substantially more readable, but this causes mypy to complain if unknown keys are passed as kwargs (useful for request-specific pipeline configuration). * ci: Enable mypy in CI * fix(typing): Fix extranous `total=False` on TypedDict * fix(typing): Propagate model config type hint upwards * fix(typing): Ensure that `len` is only called on objects that implement Sized in _tracing.py * fix(typing): Resolve implicit optional type for Turn * fix(typing): Resolve missing/inaccurate return types in simulator_data_classes * fix(typing): Refine the TExperimental type for experimental() * fix(typing): Ignore the method assign in experimental decorator * fix(typing): Remove unnecessary optional for _add_method_docstring * fix(typing): Mark get_token as sync The abstract method `get_token` is marked as async, but both concrete implementations are sync and every use of it in the codebase is in a sync context. * fix(typing): Add type hints for APITokenManager attributes * fix(typing): Prevent type-reassignment in APITokenManager * refactor: Remove unnecessary pass * fix(typing): Explicitly list accepted kwargs for derived APITokenManager classes * fix(typing): Mark PlainTokenManager.token as non-optional str * fix(typing): Mark *_prompty args as Optional in _simulator.py * fix: Don't raise bare strings * fix(typing): Fix return type for _apply_target_to_data * fix(typing): Use TypedDict as argument to _trace_destination_from_project_scope * fix(typing): Fix return type of Simulator._complete_conversation * fix(typing): Correct the param type of _process_column_mappings * fix(typing): evaluators param Dict[str, Any] -> Dict[str, Callable] * fix(typing): Add type annotation for processed_config * fix(typing): Remove unnecessary variable declaration from _evaluate * fix(typing),refactor: Clarify to mypy that fetch_or_reuse_token always returns str * fix(typing): Add type annotations for EvalRun attributes * fix(typing): Use TypedDict for get_rai_svc_url project_scope parameter * fix(typing): Specify that EvalRun.__enter__ returns Self * fix(typing): Add type annotation in evaluate_with_rai_service * fix(typing),refactor: Make EvalRun.info a non-Optional property * fix(typing): Add a type annotation in log_artifact * fix(typing): Add missing MLClient import * fix(typing): Add missing return to EvalRun.__exit__ * fix(typing),refactor: Clarify that _get_evaluator_type always returns str * fix(typing): Add type annotations in log_evaluate_activity * fix(typing): QAEvaluator accepts typed dict and returns Dict[str, float] * fix(typing): Set USER_AGENT to a str when import fails * fix: Avoid using a dangerous default value Using a mutable value as a parameter default is dangerous, since mutations will persist across function calls. See pylint error code `W0102(dangerous-default-value)` * fix(typing): Remove unused *args from OpenAIChatCompletionsModel.__init__ * fix(typing): Avoid name-redefinition due to repeat import * fix(typing): Make EvaluationMetrics an enum * fix(typing): Use TypedDict for AzureAIProject params * fix(typing): Type credential as azure.core.credentials.TokenCredential * fix(typing): Clarify that _log_metrics_and_insant_results returns optional str * fix(typing), refactor: Add a utility function to validate AzureAIProject dict * fix(typing): Resolve mismatch with namedtuple type name and variable name * refactor: Remove unused attribute AdversarialTemplateHandler.cached_templates_source * fix(typing): Resolve type reassignment in proxy_model_completion * fix(typing): Add type annotation for ProxyChatCompletionModel.result_url * fix(typing): Add types annotations to BatchRunContext methods * fix(typing): Add type annotation for ConversationBot.conversation_starter * fix(typing): Fix return type of ConversationBot.generate_responses * fix(typing): Clarify return type of simulate_conversation * fix(typing): Add type ignore for OpenAICompletionsModel.format_request_data * fix(typing): Remove unnecessary type annotation in OpenAICompletionsModel.format_request_data * fix(typing): Clarify that content safety evaluators return Dict[str, Union[str, float]] * fix(typing): Clarify return type of ContentSafetyChatEvaluator._get_harm_severity_level * fix(typing): Add type annotations to ContentSafetyChatEvaluator methods * fix(typing): Add type annotations for ContentSafetyEvaluator * fix(typing): Use a callable object in AdversarialSimulator * refactor: Use a set literal for CONTENT_HARM_TEMPLATES_COLLECTION_KEY * fix(typing): Specify evaluate return type to narrow log_evaluate_activity type * fix(typing): Add type annotations to adversarial simulator * fix(typing),refactor: Clarify that _setup_bot's fallthrough branch is unreachable _setup_bot does exhaustive matching against all ConversationRole's enum values * fix(typing): Make SimulationRequestDTO.to_dict non-destructive * fix(typing): Add type annotations to code_client.py * fix(typing): Correct Simulator__call__ task parameter to be List[str] * fix(typing): evaluators Dict[str, Type] -> Dict[str, Callable] * fix(typing): Make CodeClient.get_metrics always return a dict * fix(typing): Add type annotations to evaluate/utils.py * fix(typing): Clarify that CodeRun.get_aggregated_metrics returns Dict[str, Any] * fix(typing): data is a required parameter for _evaluate * fix(typing): Add variable annotations in _evaluate * fix(typing),refactor: Prevent batch_run_client from being Union[ProxyClient,CodeClient] Despite having similar interfaces with compatible calling conventions, the fact that ProxyClient and CodeClient have different "run" types (ProxyRun and CodeRun) causes type errors when dealing with a client of type Union[ProxyClient,CodeRun]. Mypy must consider the case when the wrong run type is used for a given client, despite that not being possible in this function. Refactoring the relevant code into a function allows us to clarify to mypy that client and run types are used consistently. * fix: Remove unused imports * fix(pylint): Resolve R1711(useless-return) * fix(pylint): Resolve W0707(raise-missing-from) * fix(pylint): Add parameters/returns to http_utils docstrings * fix(pylint): Make EvaluationMetrics implement CaseInsentitiveEnumMeta * fix: Remove return type annotations for Evaluators Promptflow does reflection on type annotations, and only accepts a dataclass, typeddict, or string as return type annotation. * fix(typing): Add runtime validation of model_config * fix: Remove type annotations from evaluator/simulators credential param Promptflow does reflection on type annotations and only allows dict * fix: Remove type annotations from azure_ai_project param Promptflow does reflection on param types and disallows TypedDicts * fix(typing): {Azure,}OpenAIModelConfiguration.type is NotRequired * fix(typing): List[Dict] -> list for conversation param * tests: Fix tests * fix(typing): Make RaiServiceEvaluatorBase also accept _InternalEvaluationMetrics * fix(typing): Use typing.final to enforce "never be overriden by children" * fix(typing): Use abstractmethod to enforce "children must override method" * fix(typing): Add type annotations to EvaluatorBase * ci: Add "stringized" to cspell * fix: Explicitly pass in data to get_evaluators_info Resolves a bug where the function was capturing data from the other scope, but data wasn't changed to the approriate value until after the function call.
- Loading branch information