Skip to content

Error 400 when setting individual_page_selector in process options #14101

@Kedaqusa1604

Description

@Kedaqusa1604

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

I'm trying to process single pages of a document using the ocr, but when I set the pages value in documentai_v1.ProcessOptions.IndividualPageSelector diff to '[1]', I'm getting the following error: google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.

this is how my code looks like:

def main():
    api_endpoint = f"us-documentai.googleapis.com"
    project_id = ""
    processor_id = ""
    credentials = {}

    opts = ClientOptions(
        api_endpoint = api_endpoint    
    )

    client = documentai_v1.DocumentProcessorServiceClient(
        client_options = opts,
        credentials = service_account.Credentials.from_service_account_info(
            credentials
        )
    )

    full_processor_name = client.processor_path(
        project = project_id,
        processor = processor_id,
        location = location
    )

    request = documentai_v1.GetProcessorRequest(name = full_processor_name)
    processor = client.get_processor(request = request)

    with resources_folder.joinpath("1.pdf").open("rb") as fp:

        content = fp.read()

        document = documentai_v1.RawDocument(
            content = content,
            mime_type = "application/pdf"
        )

    process_options = documentai_v1.ProcessOptions(
        individual_page_selector = documentai_v1.ProcessOptions.IndividualPageSelector(
            pages = [1]
        )
    )

    request = documentai_v1.ProcessRequest(
        name = processor.name, 
        raw_document = document,
        process_options = process_options
    )

    result = client.process_document(request = request)

the document I'm trying to process has 4 pages, and here's the funny thing:

  • with this ProcessOptions, it works:
process_options = documentai_v1.ProcessOptions(
    individual_page_selector = documentai_v1.ProcessOptions.IndividualPageSelector(
        pages = [1]
    )
)
  • with this, fails:
process_options = documentai_v1.ProcessOptions(
    individual_page_selector = documentai_v1.ProcessOptions.IndividualPageSelector(
        pages = [3]
    )
)

getting the following error:

Image
  • and with this, also works:
process_options = documentai_v1.ProcessOptions(
  individual_page_selector = documentai_v1.ProcessOptions.IndividualPageSelector(
      pages = [3,1]
  )
)

Thanks in advance for any help

API client name and version

No response

Reproduction steps: code

No response

Reproduction steps: supporting files

No response

Reproduction steps: actual results

No response

Reproduction steps: expected results

No response

OS & version + platform

No response

Python environment

No response

Python dependencies

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions