Description
Bugzilla Link | 21705 |
Version | trunk |
OS | Windows XP |
CC | @AaronBallman,@Bigcheese,@rnk,@synopsys-sig-compiler-frontends |
Extended Description
The implementation of llvm::sys::fs::file_status::getUniqueID() in /lib/Support/Windows/Path.inc relies on a call to the Windows GetFileInformationByHandle() 1 function to obtain a unique ID for a file regardless of the path or file name of the file. The intent is that the same unique ID be returned for each hard link, soft link, NTFS junction directory, etc... corresponding to the same file. GetFileInformationByHandle() requires an open file handle and populates a structure 2 with information about the referenced file. This information includes a file index ID that is documented (see the description of nFileIndexLow at 2) to uniquely identify a file on a particular volume (see dwVolumeSerialNumber at 2).
Unfortunately, and the Microsoft documentation is not particularly clear on this, the unique file index ID that GetFileInformationByHandle() produces is only guaranteed to be unique so long as the file remains open. Once all handles to the file have been closed, the system is free to reuse the file index ID for another file.
Additionally, Microsoft's documentation is clear that GetFileInformationByHandle() may not produce a unique file index ID for the Windows Server 2012 ReFS file system due to its use of 128-bit file identifiers. GetFileInformationByHandleEx() 3 may be used to populate a FILE_ID_INFO 4 structure with a 128-bit file ID, but Windows 8 or Windows Server 2012 is required.
llvm::sys::fs::file_status::getUniqueID() constructs a unique ID from the VolumeSerialNumber, FileIndexHigh, and FileIndexLow file_status data members typically populated by a call to GetFileInformationByHandle(). To see where problems may occur, code that creates file_status objects must be examined.
file_status objects are typically constructed by calling one of the status() overloads declared in:
/include/llvm/Support/FileSystem.h:
488 std::error_code status(const Twine &path, file_status &result);
...
491 std::error_code status(int FD, file_status &Result);
The overload taking a file descriptor should be ok. Internally, the file handle associated with the file descriptor is obtained to call GetFileInformationByHandle(). The retrieved file index ID (and thus, the UniqueID object returned by file_statis::getUniqueID()) is valid until the file is closed.
The overload taking a path is problematic. Internally, this overload opens the file identified by the path, calls a static getStatus() function with the file handle, and then closes the file handle. getStatus() calls GetFileInformationByHandle() and populates the file_status data members. The problem is, by the time status() returns, the data members in the file_status object that it has populated are no longer guaranteed to uniquely identify a file.
For files hosted on NTFS and FAT file systems, GetFileInformationByHandle() seems to reliably return consistent and unique file index IDs. However, this is not guaranteed and may simply be an implementation side effect. For files hosted on network file servers (specifically, on large Network Appliance file systems shared via CIFS), re-use of file index IDs has been observed.
Unfortunately, I don't have a test case to share. I don't have access to the servers where this was observed and I haven't been able to reproduce problems in my own local environments. Regardless, it is clear to me that GetFileInformationByHandle() does not offer the guarnatees that LLVM is using it for.