This was generated by AI during PR processing.
Context
Surfaced while finalizing PR #68 (parse git's quoted, C-escaped diff headers, issue #30). PR #68 is correct for its scope — the ASCII chars git quotes regardless of core.quotePath (tab, newline, backslash, double-quote). This is a separate, out-of-scope gap on the decode path.
Problem
_unquote_c_path in git_hunk/_hunk.py decodes each octal escape one byte at a time:
chars.append(chr(int(path[i + 1 : i + 4], 8)))
With the default core.quotePath=true, git octal-escapes every byte >= 0x80, so a multibyte UTF-8 path is emitted as a sequence of per-byte octal escapes. Example: café.txt → header "a/caf\303\251.txt". The current code produces chr(0o303) + chr(0o251) = é (U+00C3 U+00A9), i.e. the UTF-8 bytes mis-decoded as Latin-1 — mojibake that does not match the real filename. The correct decode collects the escaped bytes and decodes them as a unit (UTF-8, ideally with the same surrogateescape strategy used elsewhere per #11/#32).
Repro (default git config):
$ git init && printf 'a\nb\n' > café.txt && git add . && git commit -m init
$ printf 'a\nB\n' > café.txt
$ git-hunk list --unstaged --json # "file" comes back as "café.txt", not "café.txt"
Impact
list reports the wrong path for any non-ASCII filename, and subsequent stage/unstage/discard keyed off that path target the wrong (nonexistent) file.
Scope notes
Context
Surfaced while finalizing PR #68 (parse git's quoted, C-escaped diff headers, issue #30). PR #68 is correct for its scope — the ASCII chars git quotes regardless of
core.quotePath(tab, newline, backslash, double-quote). This is a separate, out-of-scope gap on the decode path.Problem
_unquote_c_pathingit_hunk/_hunk.pydecodes each octal escape one byte at a time:With the default
core.quotePath=true, git octal-escapes every byte >= 0x80, so a multibyte UTF-8 path is emitted as a sequence of per-byte octal escapes. Example:café.txt→ header"a/caf\303\251.txt". The current code produceschr(0o303)+chr(0o251)=é(U+00C3 U+00A9), i.e. the UTF-8 bytes mis-decoded as Latin-1 — mojibake that does not match the real filename. The correct decode collects the escaped bytes and decodes them as a unit (UTF-8, ideally with the samesurrogateescapestrategy used elsewhere per #11/#32).Repro (default git config):
Impact
listreports the wrong path for any non-ASCII filename, and subsequentstage/unstage/discardkeyed off that path target the wrong (nonexistent) file.Scope notes
core.quotePath._unquote_c_path. They share the non-ASCII-path theme but are different code paths.