Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated _parse_operation_id regex to allow for non-standard Document Intelligence Endpoints #38696

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jcbartle
Copy link

Description

The regex within the _parse_operation_id method assumes the following URL format for Document Intelligence endpoints when interpreting the Operation-Location header value:

https://xxxxxxxx.cognitiveservices.azure.com/

The full Operation-Location header value would be something like (i.e., only one forward-slash character between the domain and "documentintelligence"):

https://xxxxxxxx.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-read/analyzeResults/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx?api-version=2024-07-31-preview

While this works fine when accessing Document Intelligence resources directly, this will break if the endpoint is abstracted away behind an API gateway (in our case, Azure API Management). In this configuration, your endpoint value is probably something like this:

https://apim.example.com/azure/document-intelligence/

With the full Operation-Location header value being something like (i.e., more than one forward-slash character between the domain and "documentintelligence"):

https://apim.example.com/azure/document-intelligence/documentintelligence/documentModels/prebuilt-read/analyzeResults/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx?api-version=2024-07-31-preview

Adjusting the regex to allow for any character (as opposed to any character except forward-slashes) between https:// and /documentintelligence/ enables this scenario to work correctly and, in my testing, did not impact the use of standard endpoint URLs.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes. <-- not sure if this one qualifies for this?
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence labels Nov 25, 2024
Copy link

Thank you for your contribution @jcbartle! We will review the pull request and get back to you soon.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@kristapratico
Copy link
Member

/azp run python - documentintelligence - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kristapratico
Copy link
Member

Thanks for the contribution @jcbartle! Could you possibly add a unit test for this change?

@annatisch
Copy link
Member

Thanks for the contribution @jcbartle - we may need to wait for the SDK maintain (@YalinLi0312 ) to return from vacation to provide us with some context on why a regex was being used here. Looking at the code it seems to me that this should just be using url parsing to extract that path, probably something like this should work?
urlparse(header).path.split('/')[-1]

@jcbartle
Copy link
Author

Good morning, @annatisch. Thank you for your response. I actually like your suggestion over mine, as it's more clear what is happening than with a regular expression.

@YalinLi0312 - how do you feel about Anna's proposed change?

@pvaneck pvaneck removed the request for review from YalinLi0312 February 12, 2025 20:22
@pvaneck
Copy link
Member

pvaneck commented Feb 12, 2025

Hey, @jcbartle. I think the urlparse approach would be preferred, as it is indeed more clear (and also more a bit quicker vs regex parsing). Are you still able to make the change? If not, one of us can take over, as this is still something that should be fixed.

@jcbartle
Copy link
Author

@pvaneck - good evening. I'm happy to make the changes, but I want to ensure I'm giving proper credit. @annatisch came up with the better code, but I discovered the "bug". What's the preferred path forward?

@pvaneck
Copy link
Member

pvaneck commented Feb 13, 2025

@jcbartle, thanks for the willingness and considering attribution! For this, you can just go ahead and make the change as suggested since you found the issue, initiated the original change, and are reacting to reviewer feedback.

To make this ready for merge, can you do the following:

  • Add a changelog entry to the "Bugs Fixed" section describing the fix.
  • As @kristapratico mentioned, add a simple unit test importing _parse_operation_id, testing a few URLs, ensuring the expected operation ID is returned. I don't really see an existing test file for something like this, so maybe make a test_utils.py file in the tests folder and add it there.
    • To run the unit tests, we use tox. From the root of the package (sdk/documentintelligence/azure-ai-documentintelligence), run:
      pip install tox
      tox run -e whl -c ../../../eng/tox/tox.ini --root .
      
      This will install dependencies in a virtual environment, build the wheel and run all the tests.

@pvaneck
Copy link
Member

pvaneck commented Mar 27, 2025

Hey @jcbartle, checking in again. Still able to contribute here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants