Skip to content

cognitiveservices Deployment real-time percentage usage #28825

Open

Description

https://learn.microsoft.com/en-us/python/api/azure-mgmt-cognitiveservices/azure.mgmt.cognitiveservices.models.deployment?view=azure-python

Is your feature request related to a problem? Please describe.

I want to load-balance between different OpenAI deployments based on their real-time percentage usage.

Describe the solution you'd like

As part of the Deployment class, or any of its underlying properties, it would be helpful to have not only the configured rate limits, but also the real-time usage (especially for TPM, not just RPM).

Describe alternatives you've considered

I have looked into Azure Metrics, but I found two problems with this approach:

  • the percentage usage metrics are only for Provisioned Managed Throughput, whereas I am looking for Standard TPM usage
  • metrics are not available in real-time

Additional context

sample code:

client = CognitiveServicesManagementClient(credential=DefaultAzureCredential(), subscription_id=sub_id)
deployments = client.deployments.list(resource_group_name="some_rg", account_name="some_acct")
print('Deployments:')
for d in deployments:
   ...

sample out:

Deployments:
================
gpt-4-turbo
================
{
  "additional_properties": {},
  "id": "...",
  "name": "gpt-4-turbo",
  "type": "Microsoft.CognitiveServices/accounts/deployments",
  "sku": "{'additional_properties': {}, 'name': 'Standard', 'tier': None, 'size': None, 'family': None, 'capacity': 10}",
  "system_data": "{'additional_properties': {}, 'created_by': '...', 'created_by_type': 'User', 'created_at': datetime.datetime(2024, 4, 10, 15, 19, 31, 6416, tzinfo=<isodate.tzinfo.Utc object at 0x10673f050>), 'last_modified_by': 'radu@gocascade.ai', 'last_modified_by_type': 'User', 'last_modified_at': datetime.datetime(2024, 4, 10, 15, 19, 31, 6416, tzinfo=<isodate.tzinfo.Utc object at 0x10673f050>)}",
  "etag": "...",
  "properties": "{'additional_properties': {}, 'provisioning_state': 'Succeeded', 'model': <azure.mgmt.cognitiveservices.models._models_py3.DeploymentModel object at 0x106bf7710>, 'scale_settings': None, 'capabilities': {'chatCompletion': 'true'}, 'rai_policy_name': 'Microsoft.Default', 'call_rate_limit': None, 'rate_limits': [<azure.mgmt.cognitiveservices.models._models_py3.ThrottlingRule object at 0x106bf79d0>, <azure.mgmt.cognitiveservices.models._models_py3.ThrottlingRule object at 0x106bf7a90>], 'version_upgrade_option': 'OnceCurrentVersionExpired'}"
}
================
Properties:
{
  "additional_properties": {},
  "provisioning_state": "Succeeded",
  "model": "{'additional_properties': {}, 'format': 'OpenAI', 'name': 'gpt-4', 'version': '1106-Preview', 'source': None, 'call_rate_limit': None}",
  "scale_settings": null,
  "capabilities": {
    "chatCompletion": "true"
  },
  "rai_policy_name": "Microsoft.Default",
  "call_rate_limit": null,
  "rate_limits": [
    "{'additional_properties': {}, 'key': 'request', 'renewal_period': 10.0, 'count': 10.0, 'min_count': None, 'dynamic_throttling_enabled': None, 'match_patterns': None}",
    "{'additional_properties': {}, 'key': 'token', 'renewal_period': 60.0, 'count': 10000.0, 'min_count': None, 'dynamic_throttling_enabled': None, 'match_patterns': None}"
  ],
  "version_upgrade_option": "OnceCurrentVersionExpired"
}

If there is an existing solution to this, please let me know.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    MgmtThis issue is related to a management-plane library.Service AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.feature-requestThis issue requires a new behavior in the product in order be resolved.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions