Skip to content

IndexError in apache_beam.utils.processes when pip subprocess fails with short command (e.g. pip install <pkg>) #37515

@shaheeramjad

Description

@shaheeramjad

When a pip subprocess fails and the command is passed as a short list (e.g. [python, '-m', 'pip', 'install', 'pkg']), the exception handler in sdks/python/apache_beam/utils/processes.py raises IndexError instead of the intended RuntimeError with traceback and pip output.

Root cause

The pip-specific branch in call, check_call, and check_output uses a hardcoded index 6 for the "package name" when formatting the error message:

if isinstance(args, tuple) and (args[0][2] == "pip"):
  raise RuntimeError(
    "Full traceback: {}\n Pip install failed for package: {} \n Output from execution of subprocess: {}"
    .format(traceback.format_exc(), args[0][6], error.output)) from error
  • For ['python', '-m', 'pip', 'install', 'somepkg'] the list has only 5 elements (indices 0–4), so args[0][6] raises IndexError.
  • The "friendly" pip error path is never shown; users see an IndexError instead.

Additional problem

Even when index 6 exists (e.g. stager’s pip download -r requirements_file with many args), that index may not be a package name (e.g. it can be --find-links). The message "Pip install failed for package: --find-links" is misleading.

Steps to reproduce

  1. Use apache_beam.utils.processes.check_call (or check_output / call) with a short pip command that fails:
from apache_beam.utils import processes

# Short pip command (5 elements) that will fail (nonexistent package)
cmd = ['python', '-m', 'pip', 'install', 'nonexistent-package-xyz']
processes.check_call(cmd)
  1. When pip fails (e.g. package not found), the code hits the pip branch and formats the message with args[0][6].
  2. Actual: IndexError: list index out of range (index 6 does not exist).
  3. Expected: A RuntimeError whose message includes the full traceback and pip subprocess output (no IndexError).

Expected behavior

  • When a pip subprocess fails, the code should always raise a RuntimeError (with from error) whose message includes:
    • The full traceback
    • Useful context (e.g. that it was a pip failure; package name only when it can be determined safely)
    • The subprocess output (error.output)
  • No IndexError should occur regardless of the length or shape of the command list.

Actual behavior

  • For short pip commands (e.g. pip install <pkg>), IndexError is raised when building the error message, so the intended RuntimeError is never shown.
  • For some longer pip commands, the message can show a wrong "package" (e.g. an option like --find-links) because index 6 is assumed to be the package name.

Affected code

  • File: sdks/python/apache_beam/utils/processes.py
  • Functions: call, check_call, check_output (pip branch in each, e.g. lines 55–59, 74–78, 93–97)
  • Relevant line: .format(traceback.format_exc(), args[0][6], error.output)args[0][6] is unsafe.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions