Skip to content

clang-format-diff.py Fails to Correctly Handle Filenames with Spaces #135619

Closed
@selimkeles

Description

@selimkeles

I’ve encountered an issue with clang-format-diff.py where the script is unable to properly detect filenames that contain spaces. Upon investigating the source code, I discovered that the filename extraction relies on the following code block:

filename = None
lines_by_file = {}
for line in sys.stdin:
    match = re.search(r'^\+\+\+\ (.*?/){%s}(\S*)' % args.p, line)
    if match:
        filename = match.group(2)
    if filename == None:

Because this regular expression uses \S* to capture the filename, it stops at the first encountered space. This results in the filename not being recognized in its entirety when there are spaces—causing inaccurate processing of the diff.

Steps to Reproduce:

1- Create or rename a file such that its path contains one or more spaces (e.g., my folder/file.c).
2- Make some changes and stage them so that a diff is generated.
3- Run the process clang-format-diff.py (e.g., via a pre-commit hook).
4- Observe that the script fails to detect the file correctly, as the regex does not capture filenames with spaces.

Proposed Fix:

One straightforward solution is to update the regex to capture everything after the scaled path components—even if that includes spaces. For example, replacing (\S*) with (.+) and applying .strip() can address the issue:

match = re.search(r'^\+\+\+\s+(?:.*?/){%s}(.+)$' % args.p, line)
if match:
    filename = match.group(1).strip()

This change—matching one or more characters until the end of the line—should correctly capture filenames with spaces, while also handling any extraneous whitespace. I have tested this modification with various file paths and it resolves the issue.

Additional Note:

In my local integration of clang-format-diff.py, I also noted that the script always exits with a status code of 0—even when formatting differences are present. This can be misleading for integration in automated checks where one might expect a non-zero exit code in case of formatting issues. While I understand that the design intention might be to simply output the diff changes, it could be useful to mention this behavior in the documentation or consider an opt-in flag to affect the exit code.

Would you be open to a patch based on the above changes? I’d be happy to contribute a patch if the maintainers agree that this is an improvement.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions