Skip to content

AttributeError when Content-Disposition header doesn't contain filename parameter #31

@raphaelsamy-quantumlends

Description

Description

When using download_protected_document() (or other file download methods), an AttributeError: 'NoneType' object has no attribute 'group' can occur intermittently if the server returns a Content-Disposition header that doesn't match the expected regex pattern.

Environment

  • Library version: 6.2.0 (latest)
  • Python version: 3.12

Root Cause

The issue is in model_utils.py at line 1399-1401 in the deserialize_file() function:

if content_disposition:
    filename = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?',
                         content_disposition).group(1)
    path = os.path.join(os.path.dirname(path), filename)

The code checks if content_disposition is truthy, but does not check if re.search() actually finds a match. When the header exists but doesn't contain filename= in the expected format, re.search() returns None and calling .group(1) on None raises AttributeError.

Scenarios That Can Cause This

  1. Content-Disposition: attachment (no filename parameter)
  2. Content-Disposition: attachment; filename*=UTF-8''encoded%20name (RFC 5987 encoding)
  3. Other non-standard header formats

Error Message

AttributeError: 'NoneType' object has no attribute 'group'

Suggested Fix

Add a null check before calling .group():

if content_disposition:
    match = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?', content_disposition)
    if match:
        filename = match.group(1)
        path = os.path.join(os.path.dirname(path), filename)

Or optionally, also handle RFC 5987 encoded filenames:

if content_disposition:
    # Try standard filename first
    match = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?', content_disposition)
    if not match:
        # Try RFC 5987 encoded filename (filename*=)
        match = re.search(r"filename\*=(?:UTF-8''|utf-8'')([^;\s]+)", content_disposition)
    if match:
        filename = match.group(1)
        # URL decode if needed for RFC 5987
        if 'filename*=' in content_disposition:
            from urllib.parse import unquote
            filename = unquote(filename)
        path = os.path.join(os.path.dirname(path), filename)

Workaround

As a temporary workaround, users can pass _preload_content=False to file download methods to bypass the deserialization:

response = api.download_protected_document(id=document_id, _preload_content=False)
content = response.read()

This returns the raw urllib3.HTTPResponse object instead of going through deserialize_file().

Additional Context

This is a common bug pattern - similar issues have been reported and fixed in other libraries that parse Content-Disposition headers (e.g., gdown, requests-toolbelt).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions